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I.  INTRODUCTION 


The  design  of  integrated  circuits  requires  an  accurate  method  of 
predicting  circuit  performance.  The  traditional  breadboard  method  is 
not  able  to  satisfy  the  above  requirement  because  of  the  fact  that  the 
parasitic  components  that  are  present  in  the  breadboard  are  entirely 
different  from  the  parasitic  components  that  are  present  in  integrated 
circuits,  so  a  circuit  simulation  program  is  a  must.  Conventional 
circuit  simulation  programs  [1-11]  possess  two  serious  limitations:  a 
computer  storage  requirement  and  a  computing  time  requirement,  so  the 
size  of  the  circuit  that  can  be  simulated  is  limited.  With  the  advances 
of  circuit  simulation  techniques,  the  size  of  the  circuit  that  can  be 
simulated  has  increased;  but  the  simulation  of  large  scale  integrated 
(LSI)  circuits  is  still  beyond  the  capabilities  of  present  circuit 
simulation  programs. 

The  goal  of  this  research  was  to  study  new  approaches  to  the 
simulation  of  integrated  circuits  which  can  alleviate  the  ab^>ve 
two  limitations,  namely  the  repetitiveness  and  latency 
properties  of  digital  integrated  circuits.  Since  a  DEC-10  version  of 
SPICE2  was  available  to  us,  it  was  decided  that  this  program  would 
serve  as  a  vehicle  for  testing  our  algorithms.  However,  in  the  initial 
phases  of  our  research,  it  was  found  that  our  version  of  SPICE2  had 
several  deficiencies  in  the  implementation  of  some  of  its  algorithms 
which  occasionally  caused  numerical  difficulties.  In  order  to  resolve 
these  difficulties  a  new  reordering  scheme  for  the  modified  nodal 
approach  was  developed,  a  new  concept  -  a  piecewise  nonlinear  approach 
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-  for  the  Newton-Raphson  iteration  was  proposed,  and  two  problems  with 
the  numerical  integration  algorithm  were  resolved.  The  new  reordering 
scheme  for  the  modified  nodal  approach  not  only  avoids  zero  diagonal 
pivot  elements  which  increases  numerical  accuracy,  but  it  also  signi¬ 
ficantly  reduces  the  number  of  fills  in  the  matrix  which  reduces  the 
computational  cost.  The  piecewise  nonlinear  approach  reduces  the  number 
of  iterations  needed  to  find  the  solution  of  a  nonlinear  circuit  and 
improves  the  global  convergence  property  of  Newton-Raphson  method.  The 
resolution  of  the  two  problems  with  the  numerical  integration  algorithm 
provides  more  efficiency  and  accuracy.  All  of  these  new  developments 
result  in  a  modified  version  of  SPICE2  (YSPICE) ,  which  is  2  to  5  times 
faster  than  SPICE2. 

Although  YSPICE  is  more  efficient  and  more  accurate  than  SPICE2, 
it  is  still  not  powerful  enough  to  handle  LSI  circuits  simulation 
problems.  Experience  has  shown  that  LSI  circuits  possess  properites 
which  can  be  exploited  to  improve  the  storage  and  computing  time 
requirements.  The  two  properties  are  the  repetitiveness  of  a  limited 
number  of  subcircuits  and  the  latency  that  may  exist  within  parts  of 
the  circuits  during  an  analysis.  Conventional  circuit  simulation 
programs  do  not  exploit  these  two  properties,  so  all  of  the  node 
voltages  or  branch  voltages  and  currents  are  calculated  at  each 
iteration  and  each  timepoint.  In  order  to  increase  the  capabilities  of 
circuit  simulation  programs  substantially,  these  two  properties  must 
be  fully  exploited.  When  the  first  property  is  exploited  both  computer 
storage  requirements  and  computing  time  can  be  reduced  in  several  ways. 
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First,  only  one  subcircuit  description  for  each  type  of  repetitive 
subcircuit  need  be  stored;  secondly,  only  one  set  of  small  submatrix 
sparse  matrix  pointers  for  each  type  of  repetitive  subcircuit  is 
needed  so  that  both  storage  and  preprocessing  time  can  be  saved; 
thirdly,  if  one  type  of  subcircuit  is  linear,  then  the  LU  factorization 
of  that  type  of  subcircuit  need  be  found  once  only.  When  the  second 
property  is  exploited,  we  only  need  to  solve  for  the  active  parts  of 
the  circuit  and  this  reduces  the  computational  effort  considerably. 
Tearing  methods,  first  introduced  by  Kron  [12],  are  well  suited  for  the 
exploitation  of  these  two  properites  as  well  as  the  sparsity  of  the  net¬ 
work.  Recently,  the  use  of  tearing  methods  and  latency  [13-24]  has 
been  studied  to  exploit  these  two  properties,  but  in  order  to  fully 
exploit  these  two  properties  more  research  effort  is  needed. 

In  the  second  stage  of  our  research,  these  two  problems  were 
studied  extensively  and  the  result  of  our  investigations  is  a  new 
general  purpose  circuit  simulation  program  SLATE  (a  Simulator  with 
Latency  and  Tearing).  SLATE  evolved  from  YSPICE,  so  it  has  all  the 
good  features  of  YSPICE:  in  addition,  several  new  approaches  are  used. 
First,  the  new  reordering  strategy  for  the  modified  nodal  approach  is 
used  at  both  the  subcircuit  and  interconnection  levels;  secondly,  ways 
of  exploiting  sparsity  that  exist  at  the  subcircuit  and  interconnection 
levels  were  theoretically  and  experimentally  studied  and  the  most 
efficient  way  is  used;  thirdly,  node  tearing  is  used  such  that  the 
program  is  more  efficient  and  the  final  equation  formulation  is  suit¬ 
able  for  latency  exploitation  and  parallel  processing;  fourthly, 
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latency  in  he  Newton-Raphson  iterations  is  exploited  not  only  at  device 
and  subcircuit  levels,  but  also  at  interconnection  levels;  fifthly, 
latency  in  the  time  domain  is  exploited  not  only  at  device  and  subcircuit 
levels,  but  also  at  interconnection  levels;  sixthly,  three  latency  in 
time  criteria  schemes  were  studied  thoroughly  in  relation  to  the  spread 
of  the  time  constants  in  the  subcircuits  and  the  best  scheme  was  deter¬ 
mined;  and  lastly,  the  interconnection  matrix  formulation  method  is 
general  enough  to  accommodate  the  situation  when  there  are  no  subcircuits 
specified  in  the  network  or  when  the  interconnection  circuits  consist  of 
more  than  tearing  nodes. 

Both  YSPICE  and  SLATE  are  written  in  FORTRAN  and  have  a  SPICE-like 
input  language  for  user  convenience.  If  no  subcircuits  are  used,  then 
the  methods  of  analysis  of  SLATE  is  equivalent  to  that  of  YSPICE,  that 
is,  YSPICE  is  a  subset  of  this  new  program  SLATE.  Simulation  results 
indicate  that  the  speed  of  SLATE  is  about  an  order  of  magnitude  faster 
than  SPICE2,  and  the  output  results  are  either  the  same  as  or  more  ac¬ 
curate  than  those  of  SPICE2. 

The  new  reordering  scheme  for  the  modified  nodal  approach  is 
described  in  Chapter  2,  and  the  comparison  between  this  new  scheme  and 
that  used  in  SPICE2  is  given  The  piecewise  nonlinear  approach  is 
explained  in  Chapter  3  and  simulation  results  are  given.  The  two 
problems  with  numerical  integration  are  detailed  in  Chapter  4,  and  the 
solution  is  given.  Chapter  5  introduces  the  concept  of  tearing  methods 
and  gives  the  sparsity  consideration  for  the  node  tearing  method. 

Chapter  6  describes  three  latency  criteria  and  gives  the  simulation 


results  of  these  three  schemes.  Finally,  in  Chapter  7  a  summary  of 
SLATE  performance  is  given,  the  conclusions  are  presented,  and  areas 
for  future  work  are  described. 


II.  NEW  REORDERING  STRATEGY  FOR  THE  MODIFIED  NODAL  APPROACH 


The  modified  nodal  approach  (MNA)  [25]  has  been  widely  used  in  many 
computer-aided  circuit  analysis  programs  [1,11,26,27]  for  formulating 
circuit  equations.  It  is  well  known,  however,  that  while  the  more 
restrictive  nodal  approach  in  general  produces  nonzero  diagonal  elements 
for  pivoting,  the  modified  nodal  approach,  although  more  general,  may 
produce  zero  diagonal  entries  in  the  network  matrix.  This  occurs,  for 
example,  when  the  circuit  contains  voltage  sources,  short -circuits, 
inductors  at  zero  frequency  (dc  solution)  and  some  types  of  controlled 
sources.  When  sparse  matrix  techniques  with  diagonal  pivoting  are  used  for 
solving  these  types  of  circuit  equations,  extreme  care  should  be  taken  so 
as  not  to  choose  a  zero-valued  pivot.  Two  methods  have  been  proposed  for 
avoiding  pivoting  on  these  zero  diagonal  entries.  One  method  (method  1) 
involves  ordering  the  rows  and  columns  with  zero  diagonal  entries  last,  in 
the  hope  that  they  will  be  filled  before  becoming  candidates  for  pivoting 
[1,11].  Another  method  (method  2)  involves  rearranging  and/or  combining 
rows  and  columns  in  order  to  obtain  nonzero  diagonal  elements  [25]. 
However,  as  we  show  below,  there  are  two  problems  with  these  methods. 
First,  even  if  all  the  zero  diagonal  elements  which  exist  in  the  network 
matrix  at  the  formulation  stage  are  avoided  or  filled  during  the 
elimination  stage,  it  is  possible  to  generate  zero  diagonal  elements  during 
the  Gaussian  elimination  process  regardless  of  the  values  of  the  circuit 
elements;  Secondly,  these  methods  usually  are  not  efficient.  For  example, 
forcing  the  zero-diagonal  entries  to  be  last  usually  increases  the  number 
of  fills  considerably. 
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In  this  chapter  a  new  reordering  scheme  for  the  modified  nodal 
approach  is  described  which  avoids  zero  diagonal  pivots  in  essentially  all 
practical  cases  and  is  very  efficient.  In  Section  2.1,  the  problems  with 
previous  methods  are  illustrated  and  explained.  In  Section  2.2,  the 
partitioning  of  the  circuit  variables  is  detailed  and  the  ordering  strategy 
is  introduced.  In  Section  2.3,  theorems  and  examples  are  given.  The 
implementation  of  this  new  scheme  resulted  in  YSPICE.  The  simulation 
results  from  YSPICE  are  given  in  Section  2.4.  In  this  Section  examples  are 
given  which  caused  computational  problems  in  our  DEC-10  version  of  SPICE2 
due  to  pivoting  on  zero  diagonal  elements,  but  which  were  successfully 
analyzed  by  YSPICE.  Also  the  number  of  fills  produced  by  YSPICE  is  much 
less  than  that  produced  by  SPICE2.  In  Section  2.5,  a  discussion  of  this 
new  ordering  strategy  is  given. 


2. '.  Problems  with  Previous  Methods 


The  MNA  matrix  can  in  general  be  expressed  in  the  form  [25] 


N 

B 

V 

S  N. 

J 

C 

D 

I 

E 

sV 

(2.1) 


where  V  is  the  set  of  node-to-datum  voltages  and  I  is  the  set  of  branch 
currents  which  are  chosen  as  additional  circuit  variables.  Yj^  is  a  reduced 
form  of  the  nodal  matrix  excluding  the  contributions  due  to  voltage 
sources,  current  controlling  elements,  etc.  B  contains  partial  derivatives 
of  the  Kirchhoff  current  equations  with  respect  to  the  additional  current 


variables  and  thus  contains  +1's  for  the  elements  whose  branch  relations 
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are  introduced.  The  branch  constitution  relations,  differentiated  with 
respect  to  the  unknown  vector  are  represented  by  the  matrices  £  and  _D.  J 
and  £  are  the  excitations. 

As  mentioned  above,  when  sparse  matrix  techniques  with  diagonal 
pivoting  are  used  for  solving  Eq.  (2.1),  zero  diagonal  elements  may  be 
encountered.  Previously,  two  methods  have  been  proposed  for  avoiding 
pivoting  on  these  zero  diagonal  elements.  However,  there  are  still  two 
problems  with  these  previous  methods:  (1)  zero  diagonal  elements  may  be 
generated  during  the  Gaussian  elimination  process,  and  (2)  the  methods  may 
not  be  the  most  efficient.  In  this  section  we  consider  the  zero  diagonal 
problem,  and  in  Section  2.4  we  discuss  the  efficiency  problem. 

Method  1  orders  the  rows  and  columns  with  zero  diagonal  entries  la3t, 
in  the  hope  that  they  will  be  filled  before  becoming  candidates  for 
pivoting.  Even  if  all  the  zero  diagonal  elements  which  exist  in  the 
network  matrix  at  the  formulation  stage  are  filled  during  the  elimination 
stage,  cutsets  of  branches  whose  currents  are  declared  as  network  variables 
in  a  modified  nodal  formulation  will  generate  zero  diagonal  elements  during 
the  Gaussian  elimination  process  regardless  of  the  values  of  the  circuit 
elements.  This  problem  is  proved  and  illustrated  by  Theorem  2.1,  Example 
2.1,  and  Example  2.2. 

Theorem  2.1.  For  any  network  which  has  cutsets  of  branches  whose  currents 
are  declared  as  circuit  variables  in  a  modified  nodal  formulation,  if  these 
current  variables  are  ordered  last,  then  zero  diagonal  elements  will  be 
generated  during  the  Gaussian  elimination  process,  regardless  of  circuit 
element  values. 


t 


i 


Proof:  Since  we  assume  that  all  the  current  variables  are  ordered  last  and 
they  form  cutsets,  therefore  floating  subnetworks  are  created.  The 
admittance  matrices  of  these  floating  subnetworks  are  singular,  therefore 
the  Yjj  in  Eq.  (2.1)  is  singular,  so  zero  diagonal  elements  will  be 
generated  during  the  Gaussian  elimination  of  Eq.  (2.1). 


The  following  Example  2.1  illustrates  Theorem  2.1. 

Example  2.1:  A  cutset  of  current  variables  (Fig.  2.1) 

If  the  method  which  orders  all  the  current  variables  last  is  U3ed  to 
formulate  the  modified  nodal  equations  of  the  circuit  shown  in  Fig.  2.1, 
the  resulting  equations  will  be  as  follows: 


S' 

N 

/  -s 

G1 

0 

1 

0 

V1 

0 

-G1 

G1 

0 

0 

-1 

V2 

0 

0 

0 

G2 

0 

1 

V3 

3 

0 

1 

0 

0 

0 

0 

E 

0 

-1 

1 

0 

0 

Ir 

0 

L 

During  the  course  of  Gaussian  elimination  due  to  the  resulting 
floating  subnetwork,  a  zero  diagonal  element  will  be  produced  at  location 
(2,2). 

Remark :  For  any  subnetwork  which  has  cutsets  of  branches  whose  currents 
are  declared  as  circuit  variables  in  a  modified  nodal  formulation,  if  the 


rows  corresponding  to  current  variables  which  have  zero-diagonal  elements 
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Fig.  2.1  Circuit  used  in  Example  2.1- 
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are  ordered  last  until  a  diagonal  entry  is  filled,  before  it  is  considered 
as  a  pivot,  then  zero  diagonal  elements  may  be  generated  during  the 
Gaussian  elimination  process,  regardless  of  circuit  element  values. 


The  proof  of  this  remark  is  the  same  as  that  of  Theorem  2.1.  In  the 
following,  Example  2.2  illustrates  this  remark. 

Example  2.2:  A  cutset  of  current  variables  (Fig.  2.2) 

If  the  reordering  strategy  mentioned  in  the  previous  remark  is  used  to 
formulate  the  equations  of  the  circuit  shown  in  Fig.  2.2,  the  matrix 
formulated  is: 


S'  > 

/  \ 

y  N 

G2  0  0  0 

V3 

0 

0  G.  -G,  1 

V, 

m 

0 

1  1 

1 

o  -gl  gl  0 

V2 

0 

0  10  0 

h 

E 

<  / 

s  / 

During  the  course  of  Gaussian  elimination  due  to  the  resulting 
floating  subnetwork,  a  zero  diagonal  element  will  be  generated  at  location 
(3,3). 

Method  2  interchanges  rows  in  order  to  obtain  nonzero  diagonal 
elements.  Even  if  all  the  zero  diagonal  elements  which  exist  in  the 
network  matrix  at  the  formulation  stage  are  avoided  before  the  elimination, 
if  there  are  loops  of  branches  whose  currents  are  declared  as  network 
variables  in  the  modified  nodal  formulation,  then  zero  diagonal  elements 
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may  be  generated  during  the  Gaussian  elimination  process  regardless  of  the 
values  of  the  circuit  elements.  This  problem  is  proved  and  illustrated  by 
Theorem  2.2  and  Example  2.3* 

Let  us  define  the  branch  whose  current  is  declared  as  a  current 
variable  in  the  modified  nodal  formulation  as  current  branch.  Let  us 
define  the  'positive'  node  as  follows:  Assuming  that  the  datum  node  can 
not  be  chosen  as  'positive'  and  that  the  datum  node  is  not  contained  in  any 

loop  formed  by  current  branches,  then  we  can  always  choose  one  of  the  two 
nodes  of  a  current  branch  as  'positive'  for  that  current  branch  and  there 
is  a  one-to-one  correspondence  between  these  'positive'  nodes  and  the 
current  branches.  An  algorithm  for  choosing  'positive'  nodes  is  given  in 
Section  2.2. 

Theorem  2.2.  For  any  network  with  a  loop  of  branches  whose  currents  are 
declared  as  network  variables  in  a  modified  nodal  formulation,  and  the 
reference  node  is  not  contained  in  the  loop  and  there  i3  no  coupling  among 
the  voltages  of  the  branches  in  the  loop,  then  if  all  the  rows 
corresponding  to  the  current  variables  are  interchanged  with  the 
corresponding  'positive'  node  voltage  rows,  zero  diagonal  elements  will  be 
generated  during  the  Gaussian  elimination  process,  regardless  of  circuit 
element  values. 

Proof:  Let  us  assume  that  after  the  rows  corresponding  to  the  current 
variables  are  interchanged  with  the  corresponding  'positive'  node  voltage 
rows,  the  rows  corresponding  to  the  current  variables  are  ordered  first, 
then  the  MNA  matrix  equation  (2.1)  is  transformed  into 


t 


-14- 


N 

/  \ 

/  \ 

Y  Y,  , 

I 

J, 

-112  -in 

~22  ~21 

~2 

CM 

c„  c, 

V, 

E 

~2  ~1 

~ 

(2.2) 


The  submatrix  being  eliminated  first  is  the  node-to-branch  incidence  matrix 
for  the  'positive'  nodes  and  the  current  variable  branches  [28],  that  is  , 
the  B^in  Eq.  (2.2).  Since  we  assume  that  the  reference  node  is  not 

contained  in  the  loop  and  there  is  no  coupling  among  the  voltages  of  the 
branches  of  the  loop,  then  there  is  a  one-to-one  correspondence  between 
each  branch  of  the  loop  and  the  corresponding  'positive'  node  and  each 
column  in  [^contains  exactly  a  +1  and  a  -1,  therefore,  is  singular  and 
zero  diagonal  elements  will  be  generated  during  the  Gaussian  elimination. 


The  following  Example  2-3  illustrates  Theorem  2.2. 

Example  2.3:  A  loop  of  current  variables  (Fig.  2.3) 

THe  circuit  equations  formulated  by  method  1  for  the  circuit  shown  in 
Fig.  2.3  in  a  transient  analysis  using  a  backward  Euler  Formula  with 
timestep  h  have  the  following  form: 
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The  submatrix  B^is  singular,  therefore  during  the  Gaussian  elimination 
a  zero  diagonal  element  will  be  generated  at  location  (4,4). 


2.2.  New  Partitioning  and  Ordering  Strategy 

From  the  previous  section,  we  conclude  that  the  topological  reasons 
for  zero  diagonal  elements  being  generated  in  the  modified  nodal  approach 
are:  (1)  cutsets  of  current  variables  and  (2)  loops  of  current  variables. 
Here  we  present  a  new  partitioning  and  ordering  strategy  which  has  the 
following  good  features: 

(1)  zero  diagonal  elements  are  avoided  before  the  Gaussian  elimination  and 
during  the  Gaussian  elimination  in  essentially  all  practical  circuits; 

(2)  it  is  efficient  and  the  number  of  fills  is  less  than  that  of  previous 


1 
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methods  ; 

(3)  it  is  easy  to  implement  and  the  partitioning  and  ordering  are  done  in 
the  preprocessing  phase,  so  it  is  well  suited  for  the  use  of  sparse  matrix 
techniques. 

Consider  a  linear  (or  linearized)  circuit  which  contains  independent 
current  and  voltage  sources,  two  terminal  resistors,  capacitors,  inductors 
and  all  types  of  controlled  sources.  We  assume  that  the  circuit  contains 
neither  loops  of  only  (independent  and  dependent)  voltage  sources  and 
inductors  nor  cutsets  of  only  (independent  and  dependent)  current  sources 
and  capacitors. 


In  the  modified  nodal 

approach,  the 

circuit 

variables  consist 

of 

node-to-datum  voltage  V 

-n 

together  with 

a  subset 

of 

branch  currents 

V 

(Henceforth  those  branches 

are  referred  to 

as  current 

branches.)  In 

the 

proposed  ordering  strategy,  the  node  voltages  are  partitioned  into  two 
subsets,  and  V9,  and  Ib  is  partitioned  into  three  subsets,  1^,  I0  and 
I3.  The  components  of  consist  of  the  currents  in  the  (dependent  and 
independent)  voltage  sources,  and  are  in  turn  partitioned  as  follows: 


=branch  currents  of  the  independent  voltage  sources. 


=branch  currents  of  the  voltage-controlled  voltage  sources. 


IC(,V  =branch  currents  of  the  current-controlled  voltage  sources. 
The  components  of  I2  and  consist  of  the  remaining  currents  which  are 


circuit  variables. 
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Let  a  graph  G^.  (posstbly  disconnected)  be  first  constructed  to  include 
all  the  current  branches,  with  all  the  other  branches  removed.  If  Gt 
contains  loops,  then  a  tree  (or  forest)  is  chosen,  with  only  finite-valued 
resistors  as  links.  This  is  always  possible  since  by  assumption  no  loops 
of  only  voltage  sources,  inductors  and  zero-valued  resistors  exist  in  the 
circuit.  Let  be  the  set  of  currents  in  the  links  of  G3 ,  then  these 

links  can  not  form  cutsets  [28].  The  components  of  I9  consist  of  currents 
in  the  inductors  and  the  remaining  currents  of  the  current  resistors. 

The  components  of  V^consist  of  the  following: 

Vv  =set  of  'positive'  node  voltages  of  the  independent  voltage 
sources . 


VVCV  Sset  of  'Positive*  node  voltages  of  the  voltage-controlled 
voltage  sources. 


“ccv 


=set  of  ’positive*  node  voltages  of  the  current-controlled 
voltage  sources. 


VfaC  =set  of  ’positive'  node  voltages  of  the  the  I2  branches. 


The  components  of  consist  of  the  remaining  node  voltages. 
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The  following  algorithm  is  followed  in  selecting  the  'positive'  node 
voltages  defined  above: 

Algorithm 

(1)  The  ungrounded  nodes  of  all  grounded  current  branches  belonging  to 
1^  or  L,  are  chosen  first  as  'positive'; 

(2)  Let  b  be  the  number  of  branches  whose  currents  belong  to  I  or  I. 

3  ~1  -2 

and  which  are  incident  at  node  j .  Whenever  a  node  of  a  current  branch  is 
chosen  as  'positive',  the  number  b^  at  its  'negative'  node  k  is  reduced  by 
one . 

(3)  If  the  b^  value  of  node  k  of  a  current  branch  is  one  and  that  node 
has  not  been  previously  selected  as  'positive',  then  node  k  is  selected 
'positive'  for  that  particular  current  branch.  If  more  than  one  node  have 
their  b^  value  equal  to  one  and  if  some  of  these  nodes  do  not  ■)%>  s  a 
conductance  (i.e.,  a  resistance  whose  current  is  not  a  circuit  variable) 
connected  to  them,  then  one  of  these  nodes  is  chosen  'positive'  first. 
Otherwise,  any  one  of  the  nodes  that  has  its  b^  value  equal  to  one  is 
chosen  'positive'. 

Step  (2)  and  (3)  are  repeated  until  all  the  branches  corresponding  to 
1^  and  ^  have  been  processed.  Note  that  up  to  this  point  there  is  always 

at  least  one  node  whose  b,  value  is  one.  This  is  because  I  and  I  do  not 

k  ~1  '2 

form  loops.  Note  also  that  the  number  of  positive  nodes  is  equal  to  the 
number  of  elements  in  1^  and  Ij.  The  polarities  of  the  currents  in  the 
current  branches  are  associated  with  the  positive  node  assignments. 


.1 

A 


-20- 


Partitioning  and  ordering  the  circuit  variables  in  the  order  of  I  , 
1-7.  VL,  V9,  and  1^,  and  writing  the  modified  nodal  equations  in  the  usual 
way  [25],  we  get  the  following  equation  structure: 
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Where  the  A^s  contain  the  partial  derivatives  of  the  Kirchhoff 
current  equations  with  respect  to  the  circuit  current  variables,  _I^ ,  ^ » 
Ij ,  and  thus  contain  0,  +1 ,  -1  only. 

By  interchanging  the  rows  corresponding  to  with  the  rows 

corresponding  to  I,  and  I,,  Eq.  (2.3)  can  be  written  in  the  following  form 

'■wi.  /-w<- 

(This  interchange  is  equivalent  to  off-diagonal  pivoting  and  is  done  in 
practice  by  a  simple  change  in  the  pointer  system  rather  than  a  physical 
interchange  of  data  in  the  rows.) 
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(2.4) 


(1) 


The  circuit  variables  are  partitioned  into  three  subgroups: 
and  I0,  (2)  V„,  and  (3)  the  remaining  variables.  The  Markowitz  scheme  [29] 
is  used  to  minimize  the  number  of  operations  within  each  subgroup.  After 
reordering,  Eq.  (2.4)  can  now  be  solved  by  Gaussian  elimination  or  LU 
factorization. 


2 . 3  Theorems  and  Examples 

If  there  are  no  current-controlled  current  sources  or  if  the 
current -controlled  current  sources  are  not  incident  at  the  'positive' 
nodes,  then  within  the  first  two  subgroups  all  the  diagonal  elements  remain 
1's  and  all  the  nonzero  off-diagonal  elements  are  -1's  during  the  Gaussian 
elimination  process,  so  the  leading  part  of  the  elimination  can  be  done 
simply  by  addition.  The  proof  is  given  below  in  Theorem  2.3*  Let  us 
consider  the  first  subgroup,  the  submatrix  associated  is  the  node-to-branch 
incidence  matrix  A  for  the  'positive'  nodes  and  the  currents  belong  to  I. 
and  I  .  Let  us  denote  the  directed  graph  of  those  nodes  and  currents  by 
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Gj.  Due  to  our  partitioning  and  ordering  strategy,  there  are  no  loops  in 
GI>  30  ia  1133  +1 's  on  the  dia8onal,  0's  or  -1 's  on  the  off-diagonal  and 
is  square  and  nonsingular. 

Theorem  2. 3.  For  any  diagonal  pivoting  the  LU  factors  of  A^  have  the 
following  special  properties:  all  the  diagonal  elements  remain  +1 's  and 

all  the  nonzero  off-diagonal  elements  are  -1  's. 


Proof:  Let  be  formulated  with  current  chosen  as  the  first  pivot 
where  1^  flows  in  branch  b^,  which  is  connected  between  node  i  and  node  j • 
After  row  and  column  interchange  the  first  row  and  column  of  A^  will  have 
the  following  form: 


1 


12 


A  =  . 
~a 

j 


l 


1 


n 


1 


s 


where  node  j  is  assumed  to  be  in  G^;  otherwise  column  one  would  be  all 
zeros  below  the  diagonal.  Note  that  the  entry  a^j  =  0  because  Gj  does  not 
have  any  loops  and  a^  i=2,3f  ••••»"  are  either  zero  or  -1.  Pivoting  on  a^ 
amounts  simply  to  adding  row  1  to  row  j.  Since  adding  any  two  rows  in  the 
incidence  matrix  of  a  directed  graph  produces  a  row  with  0,  -1  or  ♦I 
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enties,  row  j  will  then  contain  0  and  -1 's  with  +1  on  the  diagonal  because 


Let  the  submatrix  generated  by  pivoting  on  ani  be  denoted  by  A  .  A 

A  A 

can  be  considered  as  the  incidence  matrix  of  a  directed  graph  G^  where  G^. 
is  derived  from  by  removing  branch  b^  and  merging  node  1  with  node  j. 

A 

Thus  A  has  the  same  properties  as  A  ,  and  pivoting  on  its  first  diagonal 
entry  will  produce  a  submatrix  with  ones  on  the  diagonal  and  0  and  -1  's 
elsewhere.  This  proves  the  theorem. 


The  reasoning  for  the  second  subgroup  is  similar  to  Theorem  2.3- 
Now  we  would  like  to  present  the  main  result. 


Main  Result .  For  any  network  which  has  a  unique  solution,  if  the 
partitioning  and  orderinng  strategy  proposed  here  is  used  to  solve  the 
modified  nodal  equations,  then  no  zero  diagonal  elements  will  be 
encountered  during  the  Gaussian  elimination  process,  except  for  the  case 
when  controlled  sources  or  negative-valued  elements  with  some  specific  set 
of  circuit  element  values  result  in  perfect  cancellation. 

Proof:  There  are  two  kinds  of  zero  diagonal  elements  which  may  be 
encountered.  One  type  is  due  to  the  formulation  method  [25]  and  occurs  in 
the  network  matrix  before  the  elimination  process  starts.  These  zero 
diagonal  elements  are  avoided  by  interchanging  the  rows  corresponding  to 
the  'positive'  node  voltages  with  the  rows  corresponding  to  Ij  and  1^. 
During  the  elimination,  topologically,  the  zero  diagonal  elements  are 
caused  either  by  ordering  loops  of  current  variables  first  or  by  a  floating 
subnetwork  which  results  by  ordering  a  cutset  of  current  variables  last . 
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Both  of  these  situations  are  prevented  by  partitioning  away  from  I  and 

ordering  I  and  I  first ,  so  no  loops  can  be  formed  by  I  and  I  .  Since  I 

"w  -4  -«*  i  ~J2. 

consists  of  currents  in  the  links,  so  L  will  not  form  cutsets,  and  thus  no 
floating  subnetworks  will  result. 

Alternatively,  this  theorem  can  be  proved  as  follows:  Since  all  the 
currents  are  ordered  first  and  eliminated  first,  from  Theorem  2-3,  we  know 
that  the  elimination  of  these  currents  will  not  generate  zero  diagonal 
elements.  After  all  these  currents  are  eliminated,  if  is  empty,  we  are 
left  with  nodal  matrix  equations,  then  no  zero  diagonal  elements  will  be 

generated;  if  1 3  is  not  empty,  since  I3  can  not  form  loops  or  cutsets,  so 

no  zero  diagonal  elements  will  be  generated. 

A  more  rigorous  and  general  proof  can  be  found  in  [30]. 

In  the  following  we  would  like  to  use  the  new  ordering  scheme  to  solve 
those  examples  used  in  Section  2.1. 

Example  2. 1 : 


If  our  approach  is  used, 


initially,  the  matrix  formulated  is: 


After  interchanging  rows,  the  resulting  matrix  is: 


No  zero  diagonal  elements  will  be  encountered  during  the  course 
Gaussian  elimination. 


Example  2. 2: 


If  our  approach  is  used,  initially,  the  matrix  formulated  is: 


”o  1  0  0~ 

"V 

z 

1  5t  0 

l 

m 

0 

0  -C,  C,  0 

V2 

0 

jJ>  0  0  G 

0 

—  _ 

After  interchanging  rows,  the  resulting  matrix  is: 
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i 

o 

0 

0 


-G, 


0 


0 


No  zero  diagonal  elements  will  be  encountered  during  the  course  of 
Gaussian  elimination. 


Exaaele.  LJ.- 


If  our  approach  is  used, 


initially,  the  matrix  formulated  is: 


After  interchanging  rows,  the  resulting  matrix  is: 
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No  zero  diagonal  elements  will  be  encountered  during  the  course  of 
Gaussian  elimination. 

These  examples  show  that  our  approach  indeed  can  avoid  zero  diagonal 
elements  before  the  elimination  and  during  the  elimination.  However,  as 
mentioned  before,  if  there  are  controlled  sources  or  negative  valued 
elements  with  specific  set  of  element  values,  zero  diagonal  elements  may  be 
produced  due  to  perfect  cancellation. 

2.4.  Results 

The  implementation  of  this  new  algorithm  into  the  EEC-10  version  of 
SPICE2  has  resulted  in  YSPICE.  In  YSPICE,  the  'positive’  nodes  are  first 
determined  by  the  algorithm  presented  in  Section  2.2.  The  network  matrix 
is  constructed  using  the  element  stamps  as  in  [31].  The  sparse  matrix 
reordering  is  carried  out  using  the  Markowitz  criterion  [29,32]  with 
diagonal  pivoting.  The  row  interchange  is  done  by  one  extra  set  of 
pointers. 
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Examples  which  caused  computational  problems  in  the  original  version 
of  SPICE2  due  to  pivoting  on  zero  diagonal  elements  were  successfully 
analyzed  using  YSPICE.  Furthermore,  the  results  we  obtained  show  that  in 
many  cases  the  number  of  fills  produced  by  our  ordering  strategy  is  far 
lower  than  that  produced  by  previous  methods,  resulting  in  less 
computational  cost,  and  at  the  same  time,  more  accurate  solutions. 

Here  a  small  selection  of  the  examples  analyzed  by  YSPICE  is  presented 
and  the  results  are  compared  with  those  obtained  by  SPICE2. 

Example  2.4:  The  two  circuits  shown  in  Figs.  2.4(a)  and  (b)  were  analyzed 
using  SPICE2  and  YSPICE.  The  CPU  times  required  by  the  equation  solving 
subroutines  in  both  programs  for  both  circuits  are  given  in  Table  2.1. 

Table  2.1  Simulation  Data- 


Circuit 

CPU  time 

for  the  equation 
solving  subroutine 

number  of 
variables 

number  of 
operations 
per  iteration 

YSPICE 

2.4(a)— — — 1 

0.9090  sec. 

7 

16 

SPICE2 

1 . 9740  sec . 

7 

71 

YSPICE 

2.4(b) 

0.031  sec. 

10 

30 

SPICE2 

0.108  sec. 

10 

■  ■■■  _ 

101 
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The  difference  in  the  number  of  operations  between  YSPICE  and  SPICE2 
in  Table  2.1  can  be  explained  as  follows:  In  SPICE2,  the  matrix  formulated 
by  the  modified  nodal  approach  for  the  circuit  in  Fig.  2.4(a)  is  as  shown 
in  Fig  2.5(a).  It  can  be  seen  that  although  the  number  of  off-diagonal 
elements  of  the  rows  and  columns  corresponding  to  Ip  I?,  and  1^  is  small, 
they  are  not  chosen  as  pivots  until  their  corresponding  zero  diagonal 
entries  are  filled.  The  delay  causes  the  number  of  fills  to  increase 
greatly.  In  YSPICE,  the  matrix  formulated  for  the  circuit  in  Fig.  2.4(a) 
is  as  shown  in  Fig.  2.5(b).  It  can  be  seen  that  the  number  of  fills  is 
now  zero  due  to  the  off-diagonal  pivoting,  and  consequently,  the  number  of 
operations  is  reduced. 

Example  2.5:  The  circuit  shown  in  Fig.  2.6  was  also  analyzed  using  both 
SPICE2  and  YSPICE.  The  results  of  the  dc  analysis  are  shown  in  Table  2.2. 


Table  2.2  Simulation  Data. 


Node 

2 

3 

4 

5 

6 

YSPICE 

node  voltages 

2.000  V 

0.000  V 

_ 

0.000  V 

_  .. 

SPICE2 

node  voltages 

2.000  V 

2.324  V 

_ 

2.324  V 
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3  1  i i  2  4 

~X  X  0  X  0  X  o“ 

X  X  1  X  0  X  0 

0  18  8  0  8  0 
X  X  8  X  1X0 

0  0  0  1  8  8  0 

X  X  8  X  8  X  1 

_Q  0  0  0  0  1  8_ 

(a) 

i  i  i  2  i  4  1  2  4  3 

”iooxxxx“ 

0  1  0  X  X  X  X 

0  0  1  X  X  X  X 

0  0  0  1  0  0  0 

0  0  0  0  1  0  0 

0  0  0  0  0  1  0 

_Q  0  0  X  X  X  X_ 

(b) 

Fig.  2.5(a)  Structure  of  the  Network  Matrix  for  the 

Circuit  in  Fig.  2.4(a)  formulated  by  SPICE2 

(b)  Structure  of  the  Network  Matrix  for  the 

Circuit  in  Fig.  2.4(a)  formulated  by  YSPICE 
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Our  approach  gave  correct  results  for  this  circuit  while  SPICE2  gave 
inaccurate  results.  These  inaccuracies  can  be  explained  as  follows:  In 
SPICE2,  if  a  diagonal  element  becomes  too  small,  then  it  is  replaced  by 
1.0x10~12.  In  this  circuit  this  approach  is  equivalent  to  connecting  a 
1.0x10^'^*  resistor  from  node  4  to  ground.  In  thi3  circuit  the  diode  is 
reverse  biased,  the  equivalent  resistance  used  in  SPICE2  for  this  diode  is 
0. 721x10 as  a  result  the  computed  I  in  SPICE2  is  2. 324x10  2A,  instead 

of  the  correct  value,  which  should  be  0.0A.  This  inaccuracy  in  computing 
1^  makes  V^  =  -1.6760V  and  Vg  =  1 675 •  9999V  instead  of  0.0V. 

2-5  Discussion 

In  this  chapter  we  have  presented  an  ordering  stategy  to  be  followed 
when  the  modified  nodal  approach  is  used.  When  this  new  strategy  is  used, 
the  possibility  of  selecting  zero  diagonal  pivots  is  reduced.  The  new 
strategy  eliminates  the  need  for  having  to  continuously  check  the  pivot  and 
to  replace  it  by  a  nonzero  value  in  case  a  zero  is  generated,  as  is  done  in 
some  existing  strategies,  which  is  both  time  consuming  and  inaccurate. 

In  addition,  if  the  currents  through  the  voltage  sources  are  not 
needed,  our  ordering  scheme  provides  a  convenient  way  of  reducing 
computation  by  performing  the  backward  substitution  step  only  partially  to 
obtain  the  required  variables. 

Although  by  performing  off-diagonal  pivoting,  the  circuit  matrix  loses 
its  symmetry  and  increases  the  complexity  of  the  program,  however,  this  is  not 
a  serious  drawback.  In  fact,  in  many  of  the  examples  which  we  have 
analyzed,  we  have  observed  that  by  using  off-diagonal  pivoting,  the  number 
of  fills  is  much  less  than  that  produced  by  other  methods. 


A 
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III.  MODIFIED  NEWTON  METHOD  AND  PIECEWISE-NONLINEAR  APPROACH 

In  a  computer-aided  circuit  simulation  program,  if  a  circuit  contains 
nonlinear  elements,  then  a  nonlinear  solution  method  is  required  to  solve 
the  nonlinear  algebraic  equations  in  both  dc  analysis  and  transient 
analysis  of  the  circuit.  There  are  many  nonlinear  solution  methods 
available,  but  the  one  most  widely  used  is  the  Newton-Raphson  method.  This 
method  has  the  desirable  property  that  its  rate  of  convergence  is  quadratic 
in  the  neighborhood  of  the  solution. 

Although  The  Newton-Raphson  method  ha3  excellent  local  convergence 
properties,  it  has  problems  [1,331  when  the  initial  guess  is  not  close  to 
the  solution,  such  as  numerical  overflow,  slow  convergence,  or  no 
convergence.  Several  modified  Newton-Raphson  methods  have  been  proposed  to 
try  to  resolve  the  above  problems,  and  the  performance  of  the  basic 
Newton-Raphson  method  has  been  improved  to  some  extent.  Here  a  new 
method  -  the  piecewise  nonlinear  approach  -  is  presented,  and  examples  are 
given  which  show  even  further  improvement.  This  method  evolved  from  the 
piecewise  linear  method  and  previous  modified  Newton-Raphson  methods,  so  it 
has  the  advantages  of  both  methods.  However,  this  new  method  is  still  at 
the  experimental  stage,  no  definite  conclusion  about  it  has  been  obtained. 

This  chapter  begins  with  the  introduction  of  the  Newton-Raphson 
method.  In  Section  3.2,  problems  with  the  Newton-Raphson  method  are 
illustrated.  In  Section  3*3  the  piecewise  nonlinear  approach  is  presented. 
In  Section  3.4,  a  new  modified  Newton-Raphson  method  for  bipolar  devices  is 
detailed  and  the  piecewise  nonlinear  approach  for  bipolar  devices  including 
the  avalanche  effect  is  given  in  Section  3.5.  In  Section  3.6,  the 
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piecewise  nonlinear  approach  for  the  MOSFET  is  described.  In  Section  3-7, 
a  discussion  of  the  piecewise  nonlinear  approach  is  given. 


3.1.  The  Newton-Raphson  Algorithm 


Let  the  set  of  nonlinear  equations  be 


F(X)  =  0 


(3.1) 


If  Xk  is  the  solution  at  the  kth  iteration,  from  Taylor  series  expansion, 


we  have 


F(X)  =  F(Xk)  J(Xk)  (  X  -  X^.  )  +  higher  order  terms 


(3.2) 


Eq.  (3-2)  is  used  to  obtain  a  solution  to  Eq.  (3-1)  under  the  assumption 
that  the  higher  order  terms  are  negligible.  Thus,  we  write 


F(Xk)  ♦  J(X  )  (X  -  Xk)  -  0 


(3.3) 


Solving  Eq.  (3.3)  for  X^^  we  obtain 


(3.4) 


Eq.  (3*4)  is  called  the  Newton-Raphson  iteration  algorithm. 


3.2.  Problems  with  the  Newton-Raphson  Algorithm 


The  problems  of  numerical  overflow  and  slow  convergence  can  be 
illustrated  by  a  simple  diode  circuit  shown  in  Fig.  3.1.  The  branch 


constraint  for  the  typical  semiconductor  diode  has  the  form  i  *  I  (e  -  I) 

Given  am  initial  estimate  V  to  the  solution  for  this  circuit  as  shown  in 

o 

Fig.  3.1.  it  is  not  uncommon  for  the  solution  to  the  next 
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Newt  on-Raphson  iterate  to  be  in  the  neighborhood  of  VDD  as  shown  in  Fig. 

3.1.  If  the  exponent  in  the  diode  equation  is  too  large,  overflow  may 

occur.  Even  if  overflow  does  not  occur,  convergence  will  be  extremely  slow 

because  of  the  very  large  slope  of  the  diode  characteristic  in  this  region. 

One  modified  Newt on-Raphson  algorithm  which  has  proved  successful  in 

avoiding  the  above  problems  was  proposed  by  Colon  [33]-  In  this  algorithm 

iteration  on  current  is  employed  if  exceeds  a  reference  junction 

voltage  V  ,  this  is  illustrated  in  Fig.  3.2.  This  algorithm  is  used  in 

REF 

the  SPICE2  program. 

Another  problem  with  the  Newt on-Raphson  algorithm  is  the  lack  of 
convergence.  This  is  illustrated  in  Fig.  3-3.  The  iterate  solutions  will 
oscillate  between  VQ  and  and  never  converge  to  the  solution  V*. 

3.3*  Piecewise  Nonlinear  Approach 

This  is  a  new  approach  which  has  the  advantages  of  the  piecewise 
linear  approach  and  the  modified  Newt on-Raphson  methods.  However,  this 
method  is  still  at  the  experimental  stage,  the  proof  of  global  convergence 
or  conditions  for  global  convergence  has  not  been  obtained.  We  restrict 
our  discussion  to  two  terminal  elements.  In  this  approach,  first,  a  set  of 
breakpoints  is  chosen  and  the  device  characteristic  is  partitioned  into 
several  nonlinear  pieces.  The  partition  must  satisfy  the  following 
constraints: 

(1)  each  piece  must  be  monotonic  and  the  first  derivative  must  be 


monotonic  too; 
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(2)  the  piece  must  be  chosen  to  be  suitable  for  the  current/voltage 
iteration  to  avoid  numerical  overflow  and  to  hasten  convergence; 

(3)  the  number  of  pieces  must  be  kept  as  small  as  possible  to  avoid 
the  possibility  of  slow  convergence. 

After  the  partitioning,  the  following  algorithm  is  used  to  perform  the 
iteration: 

(1)  choose  an  initial  guess  VQ; 

(2)  linearize  the  circuit  by  Newton-Raphson  method  and  find  the 

iterate  solution  (k  =  0); 

(3)  if  Vk+L  is  within  the  original  piece,  then  use  the  modified 

Newton-Raphson  method  to  choose  V^+l'  and  continue  the  iteration; 
otherwise,  if  the  next  breakpoint  in  the  direction  of  change  has  not  been 
chosen  before,  choose  V^+i  equal  to  it;  otherwise  go  into  the  adjacent 
piece,  if  the  derivative  is  not  continuous  at  this  breakpoint,  then  choose 
this  breakpoint  as  again  but  use  the  new  derivative;  otherwise, 

choose  the  other  breakpoint  as  and  continue  the  iteration. 

This  approach  is  illustrated  in  the  graphical  solution  that  is  given 
in  Fig.  3.4.  Here  the  tunnel  diode  characteristic  is  partitioned  into 
four  pieces.  The  initial  guess  is  VQ  located  in  piece  I.  The  solution  of 
the  linearized  circuit  is  which  is  not  in  piece  I,  so  is  chosen  to  be 
equal  to  breakpoint  1.  The  solution  of  the  new  linearized  circuit  is  V-> 
which  is  still  not  in  piece  I.  Enter  piece  II  and  choose  breakpoint  2  as 
V 2*  The  iterate  solution  is  not  in  piece  II,  go  into  piece  III  and 
choose  breakpoint  3  as  V^,  the  iterate  solution  is  not  in  piece  III. 


1 V2  02v3  v5  v4  VDD 

V  PP-7024 


of  the  Piecewise  Nonlinear  Approach. 
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Enter  piece  IV  and  choose  V^  as  V^,  this  time,  the  solution  V^  is  in  piece 
IV.  Continue  the  iteration  by  modified  Newton-Raphson  method  until  the 
convergence  is  obtained. 


3.4.  A  Mew  Modified  Newton-Raphson  Method  for  Bipolar  Devices 

In  the  piecewise  nonlinear  approach,  the  diode  characteristic  i3 
partitioned  into  three  pieces  as  shown  in  Fig.  3.5.  In  region  III,  in 
order  to  avoid  numerical  overflow  and  to  compensate  for  large  higher  order 
terms,  the  modified  Newton-Raphson  method  must  be  used  [1,33,34]. 

Consider  the  simple  diode  circuit  shown  in  Fig.  3.6.  The  nodal 
equation  is 

F(V)  V  ,  I  (eV/Vt  -  1)  -  0  (3.5) 

R  3 

By  a  Taylor  series  expansion  we  obtain 

»  F  fW  ^  2 

F(Vk+1)  =  F(Vk)  +  F  (Vk)  (Vk+1  -  Vk)  — (Vk+1  '  Vk)  (3.6) 

2 

+  higher  order  terms 


where  F  (Vk)  (Vk+1 
F”(Vk) 


<Vk+l 


-L+il  e^vt 

R  Vt 


Vk/Vt 

2*Vt  6 


)  (V, 


k+1 


(V, 


k+1 


and 


If  we  assume  that  R  is  sufficiently  large,  then  the  ratio  of  the  third  term 
to  the  second  term  in  Eq.  (3*6)  is 


(V, 


k+1 


V 


2*v 


(3.7) 


Fig.  3  6(a)  Simple  Diode  Circuit. 

(b)  Newton-Raphson  Iteration  Solutions  for  (a). 
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If  (V  -  V  )  is  not  small  compared  to  2 V„,  then  the  assumption  that  the 
k+1  k  t 

higher  order  terms  in  Eg.  (3-2)  are  small  and  can  be  neglected  is  not 
true,  so  the  correction  term  AV^  =  obtained  by  the  Newton-Raphson 
method  may  not  be  good. 


Let  AVk  be  the  modified  correction  term  such  that 

,  VAVk“VDD  (VAVk)/Vt 

F(Vk+AVk)=A - R£“+IS(e  -1>  -  0 


(3.8) 


Because  AVk  satisfies 


F(Vk)  +  F'(Vk)Avk=0 


(3.9) 


so 


F(Vk)  -v  F'(Vk)hvk  s  F(Vk  +  Avy 
From  Eq.  (3-10)  we  obtain 


(3.10) 


(AVW 

R 


+  I 

s 


<V4vk>/vt 


v\1+ 

e  (1+ 


(3.11) 


From  Eq.  (3*11)  we  obtain 


AV\ 


e 


AVk"AVk 


r  V,  /V 

rse  k  c 


(3.12) 


If  (AVk  -  AVk)/R  is  sufficiently  small  compared 
V  /V 

term  Ige  k  t,  then  the  equation 


to  the  exponential 
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^  =  vc  m(i+  y-) 


(3.13) 


yields  a  good  approximation  to  the  true  solution. 


Now  we  would  like  to  find  out  the  relationship  of  Eq.  (3-13)  to 
current  iteration.  For  current  iteration,  after  obtaining 


Vls  Vk  +  iVk 


(3.14) 


we  can  obtain 


Vvt 


Vi  2  V*  -  -if 


(3.15) 


>  -Ig,  then  there  is  a  point  Vk+1  =  Vk  +  on  the  diode 


charateristic  whose  current  is  w 


I  (e(Vk^Vk)/Vt  .  1)  *  i  (eVvt  -  i)  +-£.  eVk/Vt  Av 


(3.16) 


We  obtain 


A  A  V 

=  Vtl»(1  .-*> 


(3.17) 


We  can  see  that  Eq.  (3-17)  is  identical  to  Eq.  (3. 13),  also  we  can 
see  that  the  condition  for  Eq.  (3-16)  to  have  a  solution  is 


1  +  >  o 


(3.18) 


Since  if  AV^  ^0,  then  Eq.  (3.  18)  is  satisfied,  so  we  only  need  to 
consider  the  situation  when  Av^  <  0.  This  condition  can  be  explained 

graphically  in  Figs.  3*7(a),  (b)  and  (c).  From  Eq.  (3-15)  we  see  that 

.  A 

-I3  *:  I-L  for  -Vc  ^  AV^  f  0.  Thus,  if  the  current  intercept  of  the 

load  line  with  the  linearized  diode  curve  lies  in  this  range,  then  JaV^J 
cannot  exceed  V..  and  convergence  can  be  quite  slow  if  voltage  iteration  is 
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* 

V  -  V, 


P!E-+is(//Tt  -  l)  -o 


(3.21) 


so  from  Eqs.  (3-19),  (3-20)  and  (3-21),  we  obtain  the  following 

conclusions : 


(3.22) 


(1 )  If  Vk  *  V  then  &Vk  4.  0. 

V  -v  -V 
*  k  DD  t- 

(2)  If  Vk  -AV  and  R  A  - I  ,  then  -V  4  AVk  0.  (3.22) 

S  *  V  -V  -V 

(3)  If  R  and  Vk  are  chosen  such  that  Vk  ^  V  and  R  -i  — -  , 

s 

and  voltage  iteration  is  used  in  the  forward  region,  then  the  number  of 


V  -Vv 

iteration  is  lowerbounded  by  _ E. 


For  example,  if  V  -  V  =  1.0V  and  V  =  0.025V, 
k  t 

* 

V  -V 

then  _ k  =  40. 


The  above  conclusions  show  why  current  iteration  oust  be  used  in  the 
forward  region. 

Now  we  would  like  to  examine  under  what  conditions  current  iteration 
should  be  used  and  if  there  is  a  (such  as  the  VREF  used  in  SPICE2)  to 

determine  whether  current  iteration  or  voltage  iteration  should  be  used. 

*  A 

Let  us  consider  the  simple  diode  circuit  in  Fig.  3«8.  v  and 


satisfy  Eq.  (3*23) 


//vt  „  eVvt)  =_Il 


(3.23) 


Let  us  consider  the  limiting  case  when  V  -  «  V  f  then  Eq.  (3-23) 


can  be  rewritten  as: 


R*!^  eV*/VC(v*  .  -  V  -  V* 


(3.24) 
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R*Ig  v*/v 

so  if — — — e  c>>  i,  then  current  iteration  is  preferred;  else  if 

v*/Vt<<  1,  then  voltage  iteration  is  preferred.  These  two  conditions 

Vt  e 

are  illustrated  in  Figs.  3.9(a)  and  (b). 

From  Eq.  (3-24)  we  can  conclude  that  VRRF  must  satisfy 


R*I  v  /v 
*  s  REF'  t  _  , 

- - e  -  1 


(3.25) 


Since  the  value  of  R  is  not  a  constant,  there  i3  no  universal  VREF. 

Experiments  of  the  simple  diode  circuit  with  different  values  of  R  and  V 

were  done  to  test  the  above  conclusion,  and  the  data  are  given  in  Figs. 

3.10(a),  (b),  (c),  and  (d).  These  data  confirm  Eq.  (3-25).  In  Fig 

3.10(a),  R*1s0v  'vt  is  always  much  larger  than  one,  this  explains  why 
vt 

current  iteration  is  always  better  than  voltage  iteration;  in  Fig. 

R*I  *  , 

V  /Vt- 

3.10(b),  when  V^  is  less  than  0.7V,  v£  e  z  is  Iess  than  one,  voltage 

iteration  is  better  than  current  iteration;  in  Fig.  3.10(c),  when  V  is 
R*IS  *, 

less  than  0.5V,  V^"e  /  C  is  less  tliaa  one>  so  v°Ita8e  iteration  is  better 
than  current  iteration;  in  Fig.  3.10(d),  because  ^Vk,  mav  be  less  than  -1 , 
strict  current  iteration  in  region  III  can  not  be  done.  Whenever — ^ 


is 


less  than  -1,  the  next  guess  is  reset  to  zero,  and  current  iteration  is 
resumed,  for  this  approach,  current  iteration  is  always  better  than 
voltage  iteration. 


In  the  conventional  current/voltage  iteration  approach,  such  as  the 
one  used  in  SPICE2,  there  is  a  universal  VRER,  if  VR+^  exceeds  VRER,  then 
current  iteration  is  used;  otherwise,  voltage  iteration  is  used.  In 
SPICE2,  this  VREp 


is  set  to  the  point  of  minimum  radius  of  curvature: 


iterotior 
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V 

REF 


(3.26) 


From  the  above  analysis,  we  can  see  that  this  VRR R  does  not  provide 
any  guarantee  of  fast  convergence.  The  simple  diode  circuit  was  used  again 
to  test  the  approach  used  in  SPICE2,  VQD  =  5V ,  R  =  1Q00K,  the  number  of 
iterations  used  by  SPICE2  is  12,  while  the  number  of  iterations  for  strict 


current  iteration  is  only  3- 


There  is  another  problem  associated  with  the  conventional 
current /voltage  iteration  used  in  SPICE2.  This  problem  is  illustrated  in 
Fig.  3.11.  Let  us  assume  that  the  initial  guess  is  VQ  and  the  first 
iterate  solution  is  .  Now  we  do  not  taow  which  load  lines  we  are 
encountering,  because  both  load  lines  will  give  us  v^.  If  it  is  load  line 
1,  then  voltage  iteration  should  be  used;  if  it  is  load  line  2,  then 
voltage  iteration  is  too  slow.  In  SPICE2,  because  is  less  than  vREFi 
voltage  iteration  is  used  for  both  cases.  Experimental  results  show  that 
for  VDD  s  -5V  and  R  =  1000K  the  number  of  iterations  used  by  SPICE2  is  12. 
This  problem  can  be  solved  by  using  the  piecewise  nonlinear  approach. 
Whenever  this  situation  occurs,  then  the  next  guess  is  changed  to  zero.  If 
it  is  load  line  1,  the  next  iterate  solution  is  in  the  first  quadrant  and 
voltage  iteration  is  used  to  obtain  the  solution.  If  it  is  load  line  2, 
the  next  iterate  solution  is  in  the  third  quadrant .  The  number  of 
iterations  used  by  recognizing  that  the  load  line  2  is  being  used  and 
changing  the  next  guess  to  zero  is  3- 


Also  let  us  examine  Eq. 

I 

smaller  than  1,  then  aVr  a 


(3-13)  again- 


When 


is  positive 

I 


AVR-  II  the  difference  between  AVr 


and  much 
and  aV^  is 


small  compared  to  the  iteration  error  tolerance,  then  there  is  no  need  to 


v 
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Pig'  3,11  Another  Problem  with  the  Colon  Method. 


-59- 


do  the  transformation.  Let  us  assume  the  error  tolerance  is  10*^V,  then 

AVk 

when  -  is  less  than  0.01,  there  is  no  need  to  do  the  transformation. 

The  result  of  all  the  above  analysis  is  a  new  iteration  scheme.  The 
flowchart  for  this  new  approach  is  shown  in  Fig.  3.12(a),  the  experimental 
data  for  the  simple  diode  circuit  are  also  given  in  Figs.  3.10(a),  (b), 

(c),  and  (d).  These  data  show  that  this  algorithm  works  well  for  resistor 
load  diode  circuits.  However,  when  transistor  circuits  are  solved,  the 
load  line  generated  by  linearization  changes  during  the  iterations  and  the 
algorithm  goes  into  limit  cycle  for  some  circuits.  If  the  piecewise 
nonlinear  method  presented  in  Section  3-3  (it  corresponds  to  a 
Katzenelson's  type  algorithm  [40]  for  the  piecewise  linear  approach)  is 
used,  then  probably  the  limit  cycle  problem  will  not  occur.  But  the 
piecewise  nonlinear  method  only  allows  one  diode  to  change  regions  at  a 
given  iteration,  so  the  convergence  rate  i3  slow;  also  the  piecewise 
nonlinear  method  requires  a  linear  search  to  accomplish  the  task  tnat  only 
one  diode  changes  regions.  So  instead  of  using  a  strict  piecewise 
nonlinear  approach,  the  algorithm  in  Fig.  3.12(a)  was  modified  to 
eliminate  the  limit  cycle  problem.  The  flow  chart  for  the  modified 
algorithm  is  given  in  Fig  3.12(b). 

Three  test  circuits  were  used  to  test  this  new  iteration  scheme. 
These  three  circuits  are  given  in  Fig.  3.13(a),  (b)  and  (c),  and  the  data 
are  given  in  Table  3.1.  These  test  results  show  that  the  new  iteration 


scheme  is  superior  to  the  Colon  method  used  in  SPICE2. 
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Fig. 


3.12(a)  Flowchart  for  the  New  Iteration  Scheme. 
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Fig.  3.13  Example  Circuits  (a)  One  Transistor  Amplifier. 

(b)  TTL  NAND  Gate. 

(c)  Differential  Amplifier. 
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Tab  la  3.1  Comparison  of  the  Results  between 
the  New  Approach  and  SPICE2. 


Circuit 

Number  of  iterations 

Number  of  iterations 

(New  approach) 

(SPICE2) 

Fig.  3.13(a) 

3 

6 

Fig.  3.13(b) 

7 

17 

Fig.  3.13(c) 

6 

7 
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3.5.  Piecewise  Nonlinear  Approach  for  Bipolar  Devices  Including  Avalanche 
Effects 


Although  the  avalanche  charateristic  of  diode  should  consist  of  two 
separate  exponential  functions  [35],  in  order  to  simplify  the  analysis, 
here  the  avalanche  characteristic  is  chosen  to  consist  of  only  one 
exponential  function.  The  diode  I-V  static  characteristic  used  here  is 
shown  in  Fig.  3.5. 

In  the  forward  biased  region  the  equation  for  the  diode  current  is 

Xd  8  V*Vd/Vt  -  1}  (3.27) 

The  reverse-biased  current  before  breakdown  is 


Id  -  GdVd 

The  avalanche  current  is 


(3.28) 


I 

d 


_eA(VB  -  B*Vd) 


(3.29) 


The  constants  A  and  B  are  determined  from  the  I-V  characteristic  curve, 
where  Vg  is  the  breakdown  voltage  and  Vd  is  the  junction  voltage. 


If,  in  order  to  hasten  the  convergence,  strict  current  iteration  is 
U3ed  in  pieces  I  and  III,  then  divergence  may  be  encountered  as  shown  in 
Fig.  3-5.  Therefore,  the  piecewise  nonlinear  approach  for  bipolar  devices 
with  avalanche  modeling  is  as  follows: 


( 1 )  choose  Vq  equal  to  VREE  and  piece  III; 


(2)  find  iterate  solution  Vk+1  by  the  new  modified  Newton  method.  If 
V^+i  is  within  piece  III,  repeat  this  step. 
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(3)  otherwise  go  into  piece  II,  choose  V^+1  =  0  and  use  the  new 

derivative,  if  is  within  piece  II,  then  solution  is  fo. 

(4)  otherwise,  go  into  piece  I,  choose  =  Vg,  and  use  the  new 

modified  Newton  method. 

Remark :  Only  one  nonlinear  device  is  allowed  to  change  its  region  at  one 
iteration,  otherwise,  limit  cycle  problems  may  occur. 


3.6.  Piecewise  Nonlinear  Approach  for  MOSFET 


Let  us  consider  the  simple  resistor  load  MOS  inverter  which  are  shown 
in  Fig.  3.14(a).  The  nodal  equation  is 


V  -  V, 


F(V)  =• 


DD 


b(< 


VIN  -  VV 


(3.30) 


By  Taylor  series  expansion  we  obtain 


k+1 


F(»k+1)  =  F(Vk)  .  F ' (Vk)  <Vk+1  -  Vk> 

>)  <Vl 


+  higher  order  terms 
where  f'^)  (Vfc+1  -  Vk)  =  [  —  +  8  (VIN 


V  -  V,  , 

T  k 


F  (V,  ) 


-9 


<Vl  '  V  "  T  <Vk+l  ‘  V  • 


Vk)2  (3.31) 

V  )  and 


If  we  assume  R  is  sufficiently  large,  then 

the  third  term  in  Eq.  (3.31)  -  (V.,-  *  V) 

_ _  -  _  k  (3.32) 

the  second  term  in  Eq.  (3.31)  2 (yIN  '  VT  _  Vfc) 


If  *  V^)  is  not  small  compared  to  2(VIN  -  VT  -  V^),  then  the 

assumption  that  higher  order  terms  in  Eq.  (3-12)  are  small  and  can  be 
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Fig.  3.14(a)  Resistor  Load  MOS  Inverter  Circuit. 

(b)  Saturated  Load  MOS  Inverter  Circuit. 

(c)  Depletion  Load  MOS  Inverter  Circuit. 


neglected  is  not  true,  so  the  correction  term  4V,  =  V,  . ,  -  V.  obtained  by 

R  k+1  R 

the  Newton-Raphson  method  is  not  good.  This  may  result  in  very  slow 
convergence.  Fig  3.15(a)  illustrates  this  problem.  If  the  initial  guess 
is  Vq,  then  the  first  iterate  solution  is  V^,  which  is  far  away  from  the 

k 

exact  solution  V  .  Because  the  derivative  is  large  when  is  negative,  so 

k 

it  requires  a  large  number  of  iteration  to  converge  to  V  .  Fig.  3.15(b) 
and  (c)  illustrate  the  slow  convergence  problem  with  a  saturated  load  MOS 
inverter  circuit  and  a  depletion  load  MOS  inverter  circuit  as  shown  in  Fig. 
3- 14(b)  and  (c)  respectively. 

The  above  slow  convergence  problem  can  be  resolved  by  using  the  piece - 
wise  nonlinear  approach-  For  example,  for  the  resistor  load  MOS  inverter 
circuit,  first,  the  MOSFET  characteristic  is  partitioned  into  two  pieces  as 
shown  in  Fig.  3-16,  the  first  piece  is  from  -<»  to  zero,  the  second  piece 
is  from  zero  to  +ao,  then  the  circuit  can  be  solved  as  illustrated  in  Fig. 

3.16.  The  initial  guess  VQ  is  in  piece  II,  the  first  iterate  solution 

A 

is  in  piece  I,  so  is  chosen  to  be  the  breakpoint  zero.  Since  V?  is 

k 

within  piece  II,  the  Newton-Raphson  method  is  used  to  find  the  solution  V  . 

Remark :  In  the  iteration  scheme  for  MOS  circuits,  the  piecewise  nonlinear 

approach  is  used  for  VDS.  The  change  of  VGg  and  V  GD  are  limited  by  IV  at 

each  iteration. 

3.7.  Discussion 

Two  large  circuits  were  used  to  test  the  piecewise  nonlinear  approach, 
one  is  a  bipolar  circuit  as  shown  in  Fig.  3-17,  the  other  is  a  MOS  circuit 
as  shown  in  Fig. 3. 18.  The  data  are  given  in  Table  3*2.  The  results  show 

that  this  method  improves  the  convergence  property  of  the  basic 


Fig.  3.16  Example  of  the  Piecewise  Nonlinear  Approach  for  the 
Resistor  Load  MOS  Inverter  Circuit. 


Lran_fT_rLr+ 


i.rTjai_rHI' 
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Table  3.2  Comparison  of  the  Results  between 
the  Piecewise  Nonlinear  Approach 
(PWNL)  and  SP1CE2 . 


Circuit 

Number  of  iterations 
v'PVNL) 

Number  of  iterations 
(SPICE2) 

Fig.  3.17 

34 

48 

Fig.  3.18 

10 

28 

Newton-Raphson  method;  however,  the  proof  of  the  global  convergence 
property  or  the  conditions  for  global  convergence  have  not  yet  been 
derived.  More  research  work  on  this  topic  is  needed. 
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IV.  NUMERICAL  INTEGRATION 


A  numerical  integration  method  is  required  to  determine  the  transient 
response  of  a  circuit.  In  order  to  make  the  numerical  integration  more 
accurate  and  efficient,  some  method  of  dynamically  varying  the  timestep  is 
needed,  this  is  usually  accomplished  by  a  local  truncation  error  (LTE) 
timestep  control. 


Let  us  denote  the  upperbound  on  the  local  truncation  error  by  ET.  In 

previous  work,  ET  was  established  as  follows  [36].  First,  a  maximum 

allowable  global  truncation  error  GE  and  the  solution  interval  T  are 

max 

specified.  An  assumption  that  this  global  error  is  distributed  uniformly 
within  T  is  made,  then  the  maximum  allowable  ET  per  timestep  (h)  is  given 
by 


ET  =  ^Pax  •  h  (4.1) 

T 

The  LTE  timestep  control  with  trapezoidal  integration  is  implemented 
as  follows.  First,  the  timestep  h  and  tn+^  =  tn  +  hR  are  determined,  the 
solution  at  the  timepoint  tn+^  is  found,  then  the  local  truncation  error 
(LTE)  is  evaluated  by  Eq.  (4.2). 


LTE 


4—-  *  x(t  )  M  JL.  dd3 

i2  * 


(4.2) 


where  DD3  is  the  3rd  divided  difference  [1]  and  t  <  t  <  t  ...  The  kth 

n  -  n+i 

divided  difference  is  defined  by  the  recursive  relation 


DDk  = 


DDk-i<W 


“WV 


DDL 


x(tn4-l)  •  x(Cn} 


(4.3) 


i-1  hn+l-i  n 

If  LTE  >  ET,  then  the  timestep  is  considered  too  large,  h  is  rejected,  a 

n 


new  is  computed  using 
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*/ 


2*GE 


max 


T*DD3 

and  then  a  new  timepoint  t 


n+1 


(4.4) 


is  determined.  If,  on  the  other  hand, 


LTE  <  ET,  then  the  local  truncation  error  at  timepoint  t  is  considered 

n+l 


satisfactory,  and  the  timestep  h  is  computed  using 

--■■■■  n+l 


n+l 


7 


2*GE 


max 


T*DD3 


(4.5) 


4.1.  Problems  with  Previous  Work 

When  the  above  strategy  is  applied  to  determine  the  transient  response 
of  a  circuit,  there  are  two  problems: 

(1)  Since  DD3  is  only  an  approximation  of  *(t)  ,  whenever  the  timestep 

is  changed  or  the  input  signal  changes  abruptly,  our  investigation  shows 

that  DD3  becomes  an  inaccurate  estimate  of  LTE.  This  inaccuracy  results  in 

the  following  unwanted  situations.  One  situation  is  that  at  the  timepoint 

t  j.,  if  LTE  <  ET,  the  timestep  hft+j_  is  increased,  but  at  the  next 

timepoint  t  ,  due  to  the  inaccuracy,  LTE  is  now  found  to  be  larger  than 
n+2 

ET,  30  this  timepoint  is  rejected  and  the  timestep  is  reduced.  The  other 
situation  is  even  worse.  If,  at  the  timepoint  tQ  ,  the  input  changes 
abruptly,  then  due  to  the  inaccuracy  of  the  DD3  approximation  to  x(t),  LTE 
may  be  greater  than  ET  and  the  timestep  is  reduced.  Sometimes  this  happens 
repeatedly  until  the  timestep  becomes  too  small  and  the  program  terminates. 
These  two  situations  are  explained  in  detail  later. 

(2)  For  digital  circuits,  the  total  solution  time  T  may  consist  of 
several  switching  intervals.  If  a  stable  numerical  integration  method  is 
used,  initially  in  an  interval  the  local  truncation  error  accumulates  and 
the  global  truncation  error  (GE)  increases,  but  as  the  solution  nears 


1 
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I 

I 

I 

I 


steady  state  in  a  given  switch  interval  the  global  error  decreases,  and  as 
the  solution  approaches  the  steady  state  the  global  error  goes  to  zero,  so 
that  the  upperbound  ET  given  by  Eq.  (4.1)  is  too  conservative.  This  i3 
illustrated  in  Fig.  4.1. 

Now  we  will  consider  the  above  two  problems  in  more  detail.  The  first 
situation  of  the  first  problem  can  be  illustrated  by  a  simple  RC  circuit  as 
shown  in  Fig.  4.2.  In  order  to  simplify  the  analysis,  the  backward  Euler 
method  is  used.  The  exact  solution  for  this  circuit  i3 

v(t)  =  5e't/T  (4.6) 


the  solution  obtained  by  the  backward  Euler  method  is 


0+1  "  l  +  Vr 


(4.7) 


where  v  =  5V  . 
o 


The  local  truncation  error  estimates  at  timepoints  t  ,  and  t  are 

n+1  n 


LTE  -  h*  *  DD2  . . 
n+1  n  n+1 


LTE  a  ti  t  *  DD2 
n  n- 1  n 


where  DD2 


v  ,  -  -  v  v  -  v  , 
n+1 _ n  n  n-1 


n+1  h 


n-1 


DD2 


h 

+  h  . 

n 

n-1 

v  -  v  , 

v  *  -  v  „ 

n  n-1 

n-1  n-2 

h  , 

h  „ 

n-1 

n-2 

h  .  +  h  - 
n-1  n-2 


(4.8) 

(4.9) 


Fig.  4.2(a)  Simple  RC  Circuit. 

(b)  Waveform  of  the  Simple  RC 
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Let  ua  consider  the  situation  when  h  „  =  h  ,  =  h  and  h  =  ah,  where 

n-2  n-1  n 

a  is  a  ratio  constant.  From  Eqs,  (4.7),  (4.8)  and  (4.9),  we  obtain 


LTE 


n+1 


DD2 


n+1 


LTE 


DD2 


n-1 


*  a2*[ 
a+1  1 


(4.10) 


If  both  LTE  and  LTE  are  good  approximations  of  the  true  local 
n+1  n 

truncation  errors,  then  from  Eq.  (4.6),  (4.7)  and  the  definition  of  local 


truncation  error  [36]  the  ratio  of  LTE  ,  over  LTE  should  be 

n+1  n 


LTE 

Ira 


n+1 

n 


AW 


a2*[ 


(4.11) 


Comparison  of  Eq.  (4.11)  with  Eq.  (4.10)  shows  that  the  ratio  computed  by 

2  ^ 

Eq.  (4.10)  is  wrong  by  a  factor  of  When  a  =  1,  that  is,  the  timestep 

is  constant,  then  the  estimation  by  Eq.  (4.8)  is  good.  When  a  is 

different  from  1,  then  the  estimation  by  Eq.  (4.8)  is  not  good.  Table  4.1 

gives  the  simulation  results  of  the  simple  RC  circuit,  which  confirms  the 

-3 

above  conclusion.  The  ET  used  is  10  V.  At  the  first  three  timepoi.its, 
the  timesteps  are  kept  constant,  so  the  estimation  by  Eq.  (4.8)  is  good. 
At  the  fourth  timepoint  the  timestep  is  increased  by  a  factor  of  two.  The 
true  local  truncation  error  is  O.8930E-3,  which  is  an  acceptable  error; 
but  the  estimation  by  Eq.  (4.8)  is  0.1207E-2,  which  is  larger  than  ET,  so 
the  timepoint  is  rejected. 


Now  we  would  like  to  see  if  this  inaccuracy  can  be  explained  by  the 
above  conclusion.  The  ratio  of  the  estimation  at  the  fourth  timepoint  over 
the  estimation  at  the  third  timepoint  is 


Table  4.1  Simulation  Results  of  the  Simple 
RC  Circuit. 


t  (ns) 

h(timestep) 

— 

LIE (estimate) 

True  local 
truncation  error 

m 

0.25 

0.2355E-3 

0.2339E-3 

1.75 

0.25 

0.2332E-3 

0.2314E-3 

! 

;  2.00 

0.25 

0.2309E-3 

0.2293E-3 

2.50 

0.50 

0. 1207E-2 

0.8930E-3 
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0. 1207E-2  . 

0. 2309E-3  " 


(4.12) 


instead  of  4  as  predicted  by  Eq.  (4.11).  However,  note  that  for  a  =  2 

^rr  -  ^  -  1.333  %  (4.13) 

a+1  3  4 

Eq.  (4.13)  shows  that  the  estimation  of  local  truncation  error  is  really 

2<n 

wrong  by  a  factor  of  — as  shown  in  Eq.  (4.10). 


Now  let  us  consider  the  second  situation  of  the  first  problem.  This 
situation  can  be  illustrated  by  a  simple  RC  circuit  as  shown  in  Fig.  4.3. 
Again  the  backward  Euler  method  is  used  here  for  the  simplicity  of  the 
analysis.  The  exact  solution  for  this  circuit  is 


5e 


■t/T 


tf  t 


v(t) 


v  e  C^T  +  5(1  -  e 
n 


(4.14) 


t  >  t 


the  solution  obtained  by  backward  Euler  method  is 


l  +  Vr 


4 


i+iv/t  +r?iv; 


k  <  n 


kin 


(4.15) 


where  v  =  5V . 
o 


In  the  early  version  of  SPICE2,  when  tn  exceeded  a  source  breakpoint, 

then  h  was  reduced  such  that  the  value  t  coincides  with  the  breakpoint. 
n-1  n 

The  timestep  was  reduced  to  a  small  value  and  then  the  iteration  was 

■» 

* 


Fig.  4.3(a)  Simple  RC  Circuit. 

(b)  Waveform  of  v  . 

in 

(c)  Waveform  of  v 
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continued.  Let  us  consider  the  situation  when  h  _  =  h,  h  ,  =  ah  and 

n-£  n-i 

h  =  bh,  where  a  and  b  are  ratio  constants.  From  Eqs.  (4.8)  and  (4.15), 
n 

we  obtain  the  following  estimate  of  the  local  truncation  error: 


2  2  5  b(vn+l‘5) 

LTE  =  b2h2  [  —r  5  - ] 

n+1  t(a+b)h  T2^a+b) 


(4.16) 


Let  us  also  assume  that  bh  is  not  small  enough,  so  that 


LTEn+l  >  ET 


(4.17) 


Then  the  timestep  was  reduced  and  a  new  timestep  ch  was  computed  by 


2,2 . 

c  h  [ 


b<Vn+l-5^ 


T(a+b)h  t2 (a+b) 


3  =  ET 


(4.18) 


where  c  <  b. 


The  local  truncation  error  for  this  new  timestep  ch  is  estimated  by 

5 


,  2. 2  r  5  c(vn+l'5) 

LTEn+l  “  c  h  t  T(a+c)h  +  _2 


r(a+c)h  T2(a+c) 


(4.19) 


It  follows  that 


LTE' 


n+1 


a+b  [ 1 + 


ch(vn+l~5). 
5t  J 


ET 


a+c  [1+  bh(vn+l~5)1 


5  T 


(4.20) 


Since  c  <  b  then  (a+c)  <  (a+b)  and  for  a  small  enough  a  the  ratio  in 
Eq.  (4.20)  could  be  greater  than  one.  If  this  is  the  case,  then  the  step 
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size  will  be  reduced  again.  This  could  happen  again  and  again.  This  was 
the  case  in  early  version  of  the  SPICE2  program,  and  frequently  after 
abrupt  clock  or  signal  changes  the  program  would  not  converge  and  the  job 
would  be  terminated  prematurely. 

Remark :  In  Eq.  (4.16),  the  second  term  is  an  approximation  to  the  true 
local  truncation  error,  the  first  term,  although  it  is  dominant,  is  a 
parasitic  term  which  is  generated  by  the  use  of  voltages  at  the  timepoints 
of  the  previous  switching  interval. 


The 

second  problem  is  detailed 

as 

follows . 

Let 

ffi  denote 

max 

the 

maximum 

global  truncation  error 

at 

tQ  (Fig. 

4.1). 

Assume  that 

the 

trapezoidal  method  is  used  and  that 

we 

are  dealing 

with 

an  exponentially 

decaying  waveform.  The  local  truncation  error  at  the  timepoint  tn+1  is 
given  by 


LIEn+l  -iT  *<''> 


t  £  t  S  t  ,  . 
n  n+1 


(4.21) 


and  for  this  example 


LTE 


n+1  ~  3  n,max 

12t 


(4.22) 


From  Eq.  (4.22)  and  the  definition  of  local  truncation  error,  we  obtain 

.3 

(4.23) 


-h  /r  h' 

GE  ssGE  e  n  +  -~r  V  *  GE 

n+1  max  , „  3  n,max  max 

12t 


where  GE  .  is  the  global  truncation  error  at  the  timepoint  t  .  Eq. 
n+1  n+1 

(4.23)  can  be  reduced  to 

hn  hn 

UT3  Vn,max  *  GEmax(-T^ 


(4.24) 
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The  local  truncation  error  timestep  control  requires  that 

.3 


LTE 


V 


n+1  “  12t3  n.max 


-  ET 


(4.25) 


Assuming  equality  in  Eq.  (4.25)  and  eliminating  hn/T  in  Eq.  (4.24),  we 
obtain  the  following  upper  bound  on  the  local  truncation  error. 


ET 


R 


(4.26) 


n,max 

In  order  to  check  the  above  bound  which  was  derived  for  RC  circuits,  a 


number  of  digital  circuits  were  simulated  and  the  following  empirical  bound 


on  the  local  truncation  error  was  determined. 


ET 


10 


) 


3 


(4.27) 


where  V  is  the  voltage  swing  which  is  the  supply  voltage  in  this  example. 
DD 

Eq.  (4.27)  also  holds  for  exponentially  rising  waveforms.  Given  and 

VDD  ’  then  ET  Can  be  determined  b7  Ecl*  (4.27).  and  then  Eq.  (4.2)  can  be 

used  to  control  the  timestep.  ^ 

Eq.  (4.26)  shows  that  ET  is  proportional  to  (GE  )  for  RC  circuits 
if  the  trapezoidal  method  is  used.  In  general,  for  RC  circuits,  if  a  stable 
numerical  integration  method  of  order  n  is  used,  similar  derivation  as  used 
above  can  show  that 


ET<*(GE  )<n+1)/n  (4.28) 

max 

Eq.  (4.28)  was  verified  experimentally  for  the  backward  Euler  method  and  the 
trapezoidal  method.  The  RC  circuit  as  shown  in  Fig.  4.2  was  used-  The  simu¬ 
lation  results  are  given  in  Figs.  4.4  and  4.5. 
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4.2.  Algorithm 

The  following  algorithm  for  variable  timestep  control  was  developed 
based  on  the  discussion  in  the  previous  section.  The  trapezoidal  method  is 
assumed.  ET  is  computed  by  Eq.  (4.27). 


First  let  us  derive  an  expression  for  the  estimation  of  the  local 


truncation 

error 

which  is  used 

in 

the  algorithm. 

Let  us 

assume  that 

*\i-2  =  h  ’ 

II 

r—f 

1 

sF 

ah  and  h  =  bh. 
n 

The 

solution  obtained 

by  the 

trapezoidal 

method  for  an  exponentially  decaying  waveform  is 


I  -  hn/2T 

v  =  v  - 

n+L  n  1  +  h  /2T 
n 

the  3rd  divided  difference  is  given  by 


(4.29) 


DD3 


DD2  , .  -  DD2 
n+1  n 


n+1  h  « +  h  i+h 
n-2  ti-1  n 


n 


2<l4*rt)  -§|)<l  +  f|) 


(4.30) 


where  -  is  the  error  factor  in  the  estimate  of  the  third 

2(l+a+o) 


derivative  caused  by  varying  the  timestep. 


By  taking  into  account  the  effect  of  different  timesteps,  the  expression  of 
local  truncation  error  is  given  by 


LIE„+i’  f-  *  DD3»i  *  l^f1 


(4.31) 


i 

■4 
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or  we  can  define  a  new  quantity  DD3'  which  is  given  by 


DD3 ' 


n+1 


PD2n+l  -  DD2n 
<hn-2+hn> 


(4.32) 


then  Eq.  (4.31)  can  be  reduced  to 


LTE 


n+1 


t  *  DD3n+l 


(4.33) 


Algorithm: 


(1)  Record  the  initial  time  t„,  final  time  t  .  minimum  stepsize  h  .  , 

0  f  mm 

maximum  stepsize  h  ,  and  source  breakpoints, 
max 

(2)  Set  the  initial  timestep  h  =  l\,in . 

(3)  Compute  Xj^  at  t^  =  t0  +  h,  Xg  at  t2  =  tg  +  2h,  and  X3  at 
t3  =  tQ  ♦  3h. 


(4)  Set  n  =  3  and  compute  LTE  by  Eq.  (4.33). 

3[et~ 

(5)  Compute  hn  =  If  \  <  0.6h,  then  h  =  and  go  to  (3); 

otherwise,  continue. 


(6)  Compute  fcn+l  =  fcn  +  V  If  Vl  does  not  exceed  a  source 

breakpoint,  then  go  to  (7).  If  tn+1  exceeds  a  source  breakpoint,  then  h^ 

is  reduced  such  that  the  value  fcn+l  coincides  with  the  breakpoint.  Compute 

X^  for  this  breakpoint.  Compute  LTE  by  Eq.  (4.33),  compute  h^j^ ,  if 

h_L,  <  0.6h  .  then  h.  =  h  , ,  and  go  to  (6);  otherwise,  set  h  =  h  .  and 
n+L  n  n  n+i  ^  mm 

tQ  s  t^| y 9  then  go  to  ( 3 ) • 


(7)  Compute  Compute  LTE  by  Eq.  (4.33),  compute  hnJ_, ,  if 

<  0  . 6hn,  then  hn  =  h^  and  go  to  (6);  otherwise,  continue. 

(8)  If  t^  >  t^,  then  stop;  if  not,  then  n  =  n+1 ,  and  go  to  (6). 

Remark :  The  above  algorithm  has  been  derived  for  a  fixed  order  variable 
stepsize  method  which  uses  the  trapezoidal  rule.  Our  simulation  results 
show  that  the  problems  we  mentioned  before  in  this  chapter  are  resolved  by 

this  algorithm.  If  other  fixed  order  methods  are  to  be  used,  then  the 

corresponding  equations  should  be  modified. 


V.  TEARING  METHODS  AND  SPARSITY  CONSIDERATIONS  FOR  NODE  TEARING  METHOD 


There  are  two  kinds  of  tearing  methods  -  the  branch  tearing  method  and 
the  node  tearing  method.  The  idea  of  branch  tearing  was  first  introduced 
by  Kron  [12].  Recently  Chua  and  Chen  [16]  have  shown  that  the  branch 
tearing  is  just  a  special  case  of  generalized  hybrid  analysis.  The  main 
idea  of  branch  tearing  is  to  select  a  set  of  tearing  branches  first,  then 
the  given  network  is  torn  apart  into  several  subnetworks  by  removing  these 
tearing  branches  (Fig.  5.1),  analyzing  each  subnetwork  separately,  and 
obtaining  the  solution  of  the  entire  network  by  combining  the  solutions  of 
the  subnetworks  via  the  tearing  branches.  Algebraically,  this  method  is 
equivalent  to  a  particular  ordering  of  the  hybrid  analysis  equations  such 
that  the  resulting  matrix  has  a  bordered  block-diagonal  structure  (Fig. 
5.2).  Each  block  corresponds  to  a  subnetwork,  and  the  border  corresponds 
to  the  interconnections  of  the  subnetworks. 

The  idea  of  node  tearing  was  first  introduced  by 
Sangiovanni-Vincentelli,  Chen  and  Chua  [20].  The  main  idea  is  to  select  a 
3et  of  tearing  nodes  first,  then  the  network  is  torn  apart  into  several 
subnetworks  by  removing  these  tearing  nodes  (Fig.  5.3)>  each  subnetwork  is 
analyzed  separately,  and  the  solution  of  the  entire  network  is  obtained  via 
the  tearing  nodes.  Algebraically,  this  method  is  equivalent  to  a 
particular  ordering  of  nodal  analysis  equations  such  that  the  resulting 
matrix  has  a  bordered  block-diagonal  structure  (Fig.  5.^).  Each  block 
corresponds  to  a  subnetwork,  and  the  border  corresponds  to  the 
interconnections  of  the  subnetworks. 


Fig.  5.1  Example  of  a  Network  Partitioned  into  Three 
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Fig.  5.2  Bordered  Block-Diagonal  Matrix  Formulated 
by  Branch  Tearing. 


? 
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FP- 7000 


Fig.  5.3  Example  of  a  Network  Partitioned  into  Three 
Subnetworks  by  the  Node  Tearing  Method. 
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Recently ,  because  tearing  methods  possess  several  advantages  over 
conventional  circuit  analysis  methods,  a  lot  of  effort  has  been  devoted  to 
the  study  of  tearing  methods  for  the  analysis  of  large  scale  circuits.  The 
advantages  of  tearing  methods  are  as  follows:  First,  tearing  methods  are 
suitable  for  the  exploitation  of  the  repetitiveness  of  a  limited  number  of 
subnetworks;  secondly,  tearing  methods  are  suitable  for  the  exploitation 
of  latency;  thirdly,  tearing  methods  are  suitable  for  parallel  processing. 
In  order  to  solve  the  network  by  tearing  methods  one  must  specify  a 
partitioning  strategy,  and  also  one  must  specify  a  technique  for  solving 
the  partitioned  equations.  In  the  literature  mostly  the  branch  tearing 
method  has  been  used  to  solve  large  networks  [22-24].  The  solution 
strategy  has  been  to  estimate  the  current  or  voltage  at  each  tearing  port 
and  to  excite  the  torn  subnetworks  with  independent  sources  at  these  ports. 
The  remaining  port  responses  are  computed,  and  are  substituted  into  the 
interconnection  equations.  If  these  equations  are  not  satisfied,  then 
another  estimate  is  made  of  the  variables  chosen  as  port  excitations.  This 
iterative  procedure  continues  until  convergence  is  achieved.  If  the 
subnetworks  are  nonlinear  a  multilevel  iteration  scheme  is  used,  such  as  a 
Gauss-Seidel  [22],  Newton-SOR  [24]  or  a  multilevel  Newton  iteration  [42]. 
However,  the  first  two  iteration  schemes  do  not  have  second-order 
convergence  while  the  third  scheme  requires  the  computation  of  an 
additional  Jacobian  matrix. 

Because  the  above  approach  introduces  new  variables,  such  as  tearing 
branch  currents,  the  complexity  of  the  problem  is  increased.  Also,  a 
multilevel  iteration  scheme  is  required.  Another  disadvantage  of  the 
branch  tearing  method  is  that  each  subnetwork  must  contain  the  datura  node 
for  the  network,  or  else  a  local  datum  node  must  be  chosen  for  each 
subnetwork.  In  the  program  SLATE  a  different  approach  is  used,  and  the 
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internal  subnetwork  variables  are  eliminated  from  the  tearing  node 
equations.  Then  the  tearing  node  voltages  are  computed.  Only  a  one  level 
Newton  iteration  is  required,  and  the  internal  variables  for  each 
subnetwork  can  be  eliminated  using  parallel  processing  methods.  The 
elimination  of  the  internal  variables  is  equivalent  to  replacing  the 
subnetworks  by  Norton  equivalent  circuits  at  the  tearing  nodes. 

In  our  study  of  tearing  methods  it  was  assumed  that  the  user  specifies 
the  subnetworks,  and  any  part  of  the  network  not  specified  as  a  subnetwork 
is  automatically  included  in  the  subnetwork  called  rest  of  network  in  Tigs. 
5.1  and  5"3*  The  subnetworks  are  processed  first  in  the  solution 
algorithm,  and  the  tearing  branches  or  nodes  along  with  the  rest  of  network 
equations  are  processed  last.  Thus,  the  branch  tearing  method  and  the  node 
tearing  method  described  in  Section  5.1  and  Section  5*2  are  somewhat 
different  from  those  found  in  the  literature  r_12-14j.  In  Section  5-3,  a 
comparison  between  branch  tearing  and  node  tearing  is  given.  In  Section 
5.4-,  the  derivation  of  the  construction  of  the  node  tearing  matrix  from 
subnetworks  is  detailed.  The  sparsity  considerations  for  the  node  tearing 
method  are  presented  in  Section  5-5.  The  implementation  of  node  tearing  is 
described  in  Section  5-6.  The  circuit  interpretation  of  the  tearing 
methods  is  given  in  Section  5*7.  Some  conclusions  are  given  in  Section 
5.3. 


5.1.  Derivation  of  the  Branch  Tearing  Method 

Let  N  be  a  connected  network  having  (n+1 )  nodes:  the  datum  node  n^ 

and  the  nondatum  nodes  a-  *  (n,,n, . n  },  and  b  branches, 

i  i.  n 

0  *  {  , b2  . . b^}.  the  branch  voltages,  branch  currents  and  the 

T 

node- to-da turn  voltages  be  denoted  by  e  »  (e,  ,e_, . e,  ) , 

12  b 


i. 
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T  T 

i  =*  ib)  and  v  -  (v^  ,v^  , . vn)f  respectively.  Let  the 

interconnection  be  defined  as  the  remaining  part  of  the  network  when  all 
the  subnetworks  are  removed,  that  is,  the  set  of  tearing  branches  and  the 
rest  of  the  network  in  Fig.  5*1.  Let  us  assume  that  a  proper  set  of 
tearing  branches  has  been  chosen  such  that  there  is  no  mutual  coupling 

either  among  the  torn  subnetworks  or  between  the  torn  subnetworks  and  the 
interconnection,  and  that  all  subnetworks  contain  the  common  datum  node. 
This  latter  assumption  is  made  in  order  to  avoid  floating  subnetworks  which 
result  in  singular  submatrices.  The  more  general  case  when  all  subnetworks 
do  not  contain  a  common  datum  node  is  discussed  in  [l?].  Subscripts  3,  t 
and  r  are  used  to  denote  quantities  pertaining  to  the  subnetworks,  the 
tearing  branches  and  the  remaining  branches,  respectively,  so  the  branch 
set  3  is  partitioned  into  three  subsets  3  ,  9^  and  9  (Fig.  5.1),  and  the 

node  set  a  is  partitioned  into  two  subsets  o^and  Q'r .  This  yields  the 

following  special  structures  for  the  reduced  incidence  matrix  A  and  the 
branch  conductance  matrix  G  of  network  N: 


(5.1) 


(5.2) 


-102- 


Let  the  torn  network  have  k  subnetworks 

and  B  be  partitioned  correspondingly  into  k  subsets  a  ,  & 
s  12 

3  ,  Q  . , 3 '  respectively.  With  this  partitioning,  the 

L  c.  K 

incidence  matrix  A  can  be  written  as 


the  branch  conductance  matrix  G  can  be  written  as: 


...Nk.  Let  os 

. . *0,  and 

k 

node-to-brancn 


(5.3) 


(5.4) 


J 


The  network  variables  are  constrained  by  the  Kirchhoff’s  current  law 
(KCL),  Kirchhoff's  voltage  law  (KVL)  and  the  branch  constraint  relations 


(BC)  [28], 


(KCL) 

A  i 

+  A  i  =  0 

(5.5) 

/v-Crwt  *** 

A 

i 

*  A  i  =  0 

(5.6) 

(KVL) 

e 

.. 

ATv 

(5.7) 

~s 

e 

T  T 

A  v  +  A  v 

(5.8) 

~c 

~t~s  ~rt~r 

e 

s 

,T 

A  v 

(5.9) 

~r 

~rc~r 

(BC) 

i 

s 

I  +  G  e  - 

G  e 

(5.10) 

<vSS  «w9s*S 

s 

e 

e„  +  Z.1  - 

Z  i 

(5.11) 

~Cs  ■vb'.t 

i 

I  +  G  e  - 

G  e 

(5.12) 

~r 

~rs 

~F«-rs 

Substituting  Eqs.  (5.7),  (5-8)  and  (5.9)  into  Eqs.  (5.10)  and  (5.12),  and 
then  substituting  the  results  into  Eqs.  (5.5)  and  (5.6),  we  obtain 


A  G  ATv  +  A  i  =  J 

~*9**9+*9**'9  ~SS 

A  i  +  A  G  AT  v  =  J 
~rt~t  ~rr~p^ri>^<r  ~rs 

Substituting  Eq.  (5.8)  into  Eq.  (5.11),  we  obtain 


ATv 

~t~3 


-Zi  +  AT  v  =  E 
~6~t  ~rfi~r  ~ts 


where  J  =  A  G  e  -AI  , 

/■s-9  S  ^9v9v33 


J  =  A  G  e 
~r  s  ~rr~i>-r  s 


-  A  I 
~rr'-rs 


(5.13) 

(5.14) 


(5.15) 


where  Y  .  =  A  .G  ,AT.  j=1,2, . ,k, 

~S  J  ~Sj«SJ~3J 

30  Eq  (5.16)  can  be  rewritten  as: 
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T 

where  Y  =  A _ G_A..  .  Eq.  (5-18)  ia  the  resulting  matrix  by  branch  tearing 

~r  ~ri>-is-rr 

method,  which  has  the  desirable  bordered  block-diagonal  structure  and  it  is 
a  particular  ordering  of  the  hybrid  analysis  equations. 


5.2.  Derivation  of  the  Node  Tearing  Method 

Let  the  interconnection  be  defined  as  the  remaining  part  of  the 

network  when  all  the  subnetworks  are  removed,  that  is,  the  set  of  tearing 

nodes  and  the  rest  of  network  in  Fig.  5-3-  Let  us  assume  that  a  proper 

set  of  tearing  nodes  has  been  chosen  such  that  no  coupling  exists  either 

among  the  torn  subnetworks  or  between  the  torn  subnetworks  and  the 

interconnection.  The  node  set  a  is  partitioned  into  three  subsets  arg,  a 

and  a  ,  and  the  branch  set  3  is  partitioned  into  two  subsets  8  and  3 
r  s  r 

(Fig.  5-3).  This  yields  the  following  special  structures  for  the  reduced 
incidence  matrix  A  and  the  branch  conductance  matrix  G  of  the  network  N: 


(5.19) 


(5.20) 
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Let  the  torn  subnetwork  have  k  subnetworks  N N^, . ,N^.  Let 

a  and  3  be  partitioned  correspondingly  into  k  subsets  . ,  a, 

s  s  12k 

and  3y  3 2, . , 3 k  respectively.  With  this  partitioning,  the  node-branch 

incidence  matrix  A  can  be  written  as: 


*1  *2 


A  , 
— s  1 


~s2 


A  , 
~sk 


3 


N 


~tsl ~ts2 


A_  . 
~tsk 


A 

~tr 


The  branch  conductance  matrix  G  can  be  written  as: 


*1 

*1 

/ 

isl 

• 

• 

• 

CM 

CJ 

/ 

<au 

*2 

~s2 

i 

i 

• 

•  0 

1 

• 

1 

• 

• 

• 

•  0 

1 

• 

1 

to 

v>i 

1 

I 

r 

1 

• 

?o  1 

1 

1 

1 

rG"“ 

(5.21) 


(5.22) 


-107- 


The  network  variables  are  constrained  by  the  Kirchhoff’s  current  law 
(KCL),  Kirchhoff's  voltage  law  (KVL)  and  the  branch  constraint  relations 


(KCL)  A  i  =0 


A  i  +  A  i  =  0 
s~s 


A  i  =  0 
~r>-r  ~ 


(KVL)  e  =  ATv  +  AT  v 

rwS 


e  =  AT  v  +  ATv 
~r 


(BC)  i  =  I  +  G  e  -  G  e 

~S  ~SS  ~9~S  ~9~*SS 


i  =  I  +  G  e  -Ge 
~r  ~rs  ~r~r  ~i»»rs 


(5.23) 


(5.24) 


(5.25) 


(5.26) 


(5.27) 


(5.28) 


(5.29) 


Substituting  Eqs.  (5.26)  and  (5.27)  into  Eqs.  (5.28)  and  (5.29),  and  then 


substituting  the  results  into  Eqs.  (5.23),  (5.24)  and  (5.25),  we  obtain 


A  G  ATv  +  A  G  aJ  v  r  J 

~Sb*9**S**S  >v9^'Svt  9-wC  »S 


Afc  G  ATv  +  ( A  G  A*  +  A  G  aJ  )v  +  A  G  ATv  =  J 
~t9~s~9~s  ~ts~s~ts  ~tr~c~.tr  ~t  ~tr~r~r~r  ~ts 


A  G  A^  v.  +  A  G  ATv  =■  J 
~r~r~tr>~t  ~i~r^.r~r  ~r  s 


where  J  =  A  G  e  -  A  I  . 

~SS  ^S^SS 


J  a  A  G  e  -  A  I  +  A  G  e  -  A  I  , 
~ts  ~ts~9~ss  ~ta~ss  ~tr~p~rs  ~tr~rs 


and  J  =  A  G  e  -  A  I  . 
~r s  ~r~r~rs  ~r~rs 


Eqs.  (5.30),  (5.3D  and  (5.32)  can  be  rewritten  as: 


(5.30) 


(5.31) 


(5.32) 


:  ■irf'jvr'  . -< 
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A  G  AT 

<vS»wS*wS 


T 

A  G  A 

s 


x  g  at  a,,  g  a"  +a  g  aJ  a  ga! 

^,ts~a~s  ~ts~s~ts  ~tr~r~tr  ~tr~r~r 


T 

A  G  A* 
~r~r~tr 


T 

A  G  A 
~r~rr 


(5.33) 


Let  us  now  examine  the  term  A  G  A  in  Eq.  (5.33).  Substituting  the  A  of 

-wS 

T 

Eq.  (5.21)  and  G  of  Eq.  (5.22)  into  A  G  A  ,  we  obtain 

~S  ~9~9~S 


T 

A  G  A 

'**9+*9s*S 


‘I  2 


®1  2s  1 


(5.34) 


whereJ3j  =  As>^s>Asj  J  =  1-2 . k 


,  so  Eq.  (5.33)  can  be  rewritten  as: 


Y  Y 
~tsl  ■*ts2 


!*.ti 


Y  J  Y 
~sk  ~stk 


Y  Y 
‘  •  M:sk  l~tt 


(5.35) 
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where  Y  =  A  .G  .A*  j=1,2, . ,k 

~StJ 


, Y  .  =  A„  .G  .A  .  Jal,2, . ,k 

~tsj  ji-SJ  J 


, Y  =AGAT+AGAT 
~ct  ~ts~9— cs  ~er~r'-tr 


’~rt  =  ~ti&i£r 

■ire  ■  i£rAtr 


and  Y  =  A  G  A  . 
~rr  ~p^p^r 


Eq.  (5.35)  is  the  resulting  matrix  by  node  tearing  method,  which  has 
the  desirable  bordered  block-diagonal  structure  and  is  a  particular 
ordering  of  the  nodal  analysis  equations.  In  the  above  derivation  the 
modified  nodal  method  could  have  been  used.  In  this  case  the  vectors  v 
and  v^  consist  of  Koth  node  voltages  and  currents  of  branches  for  which  an 
admittance  description  presents  difficulties.  This  is  actually  the 
formulation  used  in  the  program  SLATE. 


5.3.  Comparison  of  the  Branch  Tearing  Method  with  the  Node  Tearing  Method 

As  described  above,  branch  tearing  is  equivalent  to  a  particular 
ordering  of  the  hybrid  equations,  node  tearing  is  equivalent  to  a 
particular  ordering  of  the  nodal  equations,  so  branch  tearing  requires  the 
use  of  tearing  branch  currents  as  extra  variables.  As  a  result  of  the 


•  t 
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above  property,  node  tearing  possesses  the  following  advantages  over  branch 
tearing: 

(1)  the  dimension  of  the  matrix  formulated  by  node  tearing  is  smaller 
than  that  formulated  by  branch  tearing; 

(2)  the  number  of  nonzero  entries  in  the  matrix  formulated  by  node 
tearing  is  smaller  than  that  formulated  by  branch  tearing  [20]; 

(3)  for  passive  networks,  node  tearing  generates  a  diagonally-dominant 
matrix,  so  any  application  of  the  Gaussian  elimination  method  with  diagonal 
pivoting  is  stable,  while  this  is  not  the  the  case  in  the  branch  tearing 
method; 

(4)  usually,  in  the  analysis  of  large  scale  circuits,  node  tearing 
preserves  the  identities  of  the  resulting  torn  subnetworks,  while  branch 
tearing  sometimes  destroys  the  identities  of  the  torn  subnetworks. 
However,  one  can  generate  examples  in  which  the  opposite  is  true,  but  these 
situations  were  not  encountered  in  our  examples. 

The  above  conclusions  are  not  conclusive;  although  the  dimension  of 
the  matrix  formulated  by  branch  tearing  is  larger  than  that  of  node 
tearing,  the  extra  nonzero  entries  are  either  +1  or  -1.  If  this  property 
is  fully  exploited,  then  node  tearing  may  not  be  so  advantageous.  But  full 
exploitation  of  the  property  that  extra  entries  are  either  +1  or  -1 
requires  a  much  more  complicated  sparse  matrix  technique.  So  node  tearing 
is  preferred  and  is  used  in  the  program  SLATE. 
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5.4.  Constructing  the  Node  Tearing  Matrix  from  Subnetworks 

In  Section  5.2,  the  derivation  of  node  tearing  is  given;  however,  in 
the  real  implementation  we  do  not  want  to  3olve  the  whole  matrix  equation 
at  one  time.  We  would  like  to  process  each  subnetwork  separately,  and  then 
obtain  the  solution  of  the  entire  network  by  combining  the  results  of 
subnetwork  process. 


In  the  following  the  procedure  of  constructing  the  node  tearing  matrix 
from  subnetworks  is  detailed.  Let  us  consider  one  subnetwork  N^  Let  the 
tearing  nodes  which  are  connected  to  be  denoted  by  ati,  the  node  voltage 
of  ati  be  denoted  by  ^ti>  the  nodes  of  be  denoted  by  a^,  the  node 
voltages  of  0i  be  denoted  by  v  ,  and  the  currents  which  represents  the 
relationship  of  the  rest  of  the  network  with  this  subnetwork  be  denoted  by 
(fig.  5.5).  v  and  J  satisfy  the  following  relations: 

MCI 


v  A  Uv 

rw  L  *  , 


ti 


(5.36) 


S  J 

M 


ti 


0 


(by  KCL) 


(5.37) 


To  the  Rest 
of  the  Network 


PP-7001 
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The  node  equations  for  this  subnetwork  have  the  following  form: 


(5.38) 


By  augmenting  with  appropriate  zeros  to  match  the  dimension  and  summing  Eq. 
(5-38)  for  all  the  subnetworks  and  the  interconnection,  we  obtain 
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We  can  see  that  Eq.  (5-39)  is  identical  to  Eq.  (5-35). 


Eq.  (5.39)  is  solved  by  first  eliminating  all  the  ,Ytsi  to  obtain  the 
interconnection  matrix  equations. 


Y  „  Y 
~tt  ~tr 


Y  .  Y 
«jrt  ~rr 


(5.40) 


where  <Yct  *  Yfct  -  2  *tsi  ^si^  -sti 
1=1 

and  J  =J  -  Z  Y  .  (Y  . )  J. 

~st  list  .  ,  ~  tsi  ~si  ~ssi 

i=l 

Eq.  (5.40)  is  solved  to  obtain  solutions  for  and  ^vr ,  and  then  the 

solution  v  .  can  be  obtained  by  using  backward  substitution. 

1 

The  above  solution  procedure  can  be  modified  to  enable  us  to  process 
each  subnetwork  separately  to  obtain  the  interconnection  matrix  equations. 
Let  us  consider  Eq.  (5-38)  again,  after  eliminating  Y  . ,  we  have 

S 1 


a.  U  ,  !  L'1  Y  ^ .  v  . 

i  x  —  si  1  ~s  i  ~sti  —  si 

L  .  N  '  1 

—si  v  s  1 

-l  1  * 


cr  ,  Y  .  U  7  i  Y 
ti  ~tsi  ~si  .  ~ 


where  L  . U  .  =  Y  . , 
~sa— si  — s  l 


l’l  J  . 

~S1  —SSI 


(5.41) 


Y  =  Y  -  Y  (Y  )  *Y 
— tti  —  tti  —tsi  —si  — sti’ 
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*  -1 

and  J.  =J  .  -  Y  .(Y  .)  J  , . 

~-tsi  ^tsi  "N-tsi  ^si 

The  partial  interconnection  matrix  equations  obtained  from  Eq.  (5.41)  are 


* 

Y  v 

-tti  ~ti 


* 

itsi 


(5.42) 


By  augmenting  the  above  arrays  with  appropriate  zeros  to  match  the 

j- 

dimension  of  Y  (this  is  only  conceptual,  because  sparse  matrix  techniques 
are  used  and  every  elment  is  put  in  the  appropriate  location  in  a  one 
dimensional  array)  and  summing  up  Eq.  (5-42)  for  all  the  subnetworks  and 
the  interconnection,  we  obtain  Eq.  (5.40)  again. 


Since  £  J  .  =  0  by  KCL,  therefore  J  does  not  appear  in  the  final 
^  -wtl  ~  ~t  1 

matrix  equations  (5.39)  and  (5.40),  so  we  can  neglect  J  .  in  both  Eqs. 

~t  l 


(5-38)  and  (5.42). 


So  the  modified  solution. procedure  is  a3  follows : 


(1)  Formulate  the  simplified  Eq.  (5.38)  for  each  subnetwork,  i.e. 


®i  ati 


a.  Y  .  Y  . .  v  . 

i  ~si  ~sti  ~si 


“ti  2tsi  Ytti  Uti 


(5.43) 


J„  • 
~tsi 


(2)  Process  Eq.  (5.43)  to  obtain  the  simplified  Eq.  (5.42)  for  each 
subnetwork,  i.e. 


I feil 


(5.44) 


-116- 


(3)  Sum  up  Eqs.  (5.44)  for  all  subnetworks  and  the  interconnection 
with  appropriate  dimension  match  to  obtain  Eq.  (5.40); 

(4)  Solve  Eq.  (5.40)  to  obtain  v„  and  v  ; 

<-swu  ~r 

(5)  Solve  the  upper  part  of  Eq.  (5.41)  by  backward  substitution  to 

obtain  V  .  for  each  subnetwork  N.<  . 

_s  1  ~i 

5.5.  Sparsity  Considerations  for  the  Node  Tearing  Method 

Now  let  us  compare  different  ways  of  sparsity  exploitation  for 

processing  Eq.  (5.43).  For  the  solution  procedure  discussed  in  the 

previous  section,  both  LU  factorization  and  substitution  procedures  are 
required.  In  SLATE  the  modified  nodal  equation  formulation  and  the  new 
reordering  strategy  described  in  Chapter  2  are  used  at  the  subnetwork 
level.  After  the  current  variables  and  the  corresponding  'positive'  node 
voltage  variables  are  eliminated,  the  final  subnetwork  matrix  is 
structurally  symmetric.  So  here  we  assume  that  the  subnetwork  matrix  is 
structurally  symmetric,  under  this  assumption,  there  are  two  possible  LU 
factorization  procedures  we  would  like  to  compare.  there  are  other 

procedures  described  in  [38],  but  these  procedures  are  either  equivalent  to 
them  or  are  less  efficient. 


The  two  LU  factorization  procedures  are  denoted  by  F^  and  F2  [38],  and 
are  given  in  Table  5.1. 


where  L  ,  U  , 

=  Y  . 

-1 

V  =  L  Y 

~s»Csk 

~sk 

—  -sk'-stk 

T 

-1 

a  -l 

&  =  ItSkHsk’ 

i  -  iiski- 

*  -1-1 
and  itkEtk  =  Xttk  s  Xttk  "  XtskHsk^skXstk' 


I 


4 
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•  A  ^  A 

In  F~,  only  those  rows  of  V  (V  )  which  are  required  to  compute  V  are 

£.  •*-*  ^reij  <«wp 

computed,  V  consists  of  those  rows  of  V  corresponding  to  the  nonzero 

columns  of  J(tsk* 

Let  | .|  denote  the  number  of  nonzero  elements  in  a  vector  or  a  matrix, 
and  M(.j)  and  M(j.)  the  jth  column  and  the  jth  row  of  matrix  M, 

respectively.  Let  nt  denote  the  number  of  tearing  nodes,  and  B(k) 
satisfies  the  following  relation. 

f  0  k  =  0 

B(k)  =<  (5.45) 

1  k  £  1,  k  integer 

The  following  lemmas  are  used  to  compare  the  number  of  operations  between 

F,  and  F„  . 

1  2 

Lemma  5.1.  Suppose  the  subnetwork  matrix  is  structurally  symmetric,  then 

the  number  of  rows  of  V  equals  the  number  of  nonzero  rows  of  V. 

~req  ~ 

A 

Proof:  From  the  definition  of  V  ,  we  know  that  any  nonzero  row  of  V  which 

~P  ~ 

A 

is  not  a  row  of  JT  must  consist  of  only  fill-ins.  Suppose  it  is  row  i, 

then  there  must  be  a  nonzero  row  j  of  V  ,  j  <  i,  and  a  nonzero  L  .  ...  Row 

~p  ~sk,ij 

j  together  with  L  ,  ..creates  the  fill-ins  of  row  i.  Due  to  structure 

~SK,iJ 

symmetry,  there  also  must  be  a  nonzero  U  .  So  in  order  to  evaluate  the 

^ S K, J 1 

row  j  of  V  ,  it  is  necessary  to  evaluate  the  row  i  of  V. 

r  1^* 

Lemma  5.2.  Suppose  the  subnetwork  matrix  is  structurally  symmetric  and  £  k 
is  mxm,  then  the  difference  in  the  number  of  operations  between  F^  and  F ^ 
(DDF)  is: 


A 
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m  m 

dmf  =  z  |v(j-)|  1  V(j-)|  +  r  (|v  (j-)|  -D|v(j-)i 

J=i  J=*l 

•  2  2  l-Hsk(ji)lli(i')lB(l^(i*)l  "  2  l^eak(*J)li5(J,)lB(l^(j,)l)  (5*46) 

j*2  i*j+l  j=l 

If  V  is  full,  then 
~req 

m  m 

DNF  =  I  |V(j*)||V(j*)|  '  £  (|Usk(J‘)|  -  l)(nt  -  |v(j-)|)B(|V(j-)|) 
j*l  ~  ~  j  =  l 


•  "tUtsJ 


(5.47) 


Proof:  DNF  =  £  (|Usk(-j)|  -l)|w(j.)|  +  2  |WT(* j)|  |  V(j-)| 

j*l  ~  ~  j=l  ~ 

IJ’Sk«1>nireq<i-)l  ‘  <5'48> 

j=*2  i=j+l  M  j»l  H 

From  structure  symmetry,  we  obtain 


|w(j*)|  -  |  V(J-)| 

I WT(- j) |  -  |V(j.)| 


(5.49) 


(5.50) 


(5.51) 


From  Lemma  5.1,  we  obtain 


V.  (j*)(  *  |v(j-)iB(|v(j-)|) 


(5.52) 


Substituting  Eqs.  (5.49),  (5.50),  (5.51),  and  (5.52)  into  Eq. 

(5.48),  we  obtain  Eq.  (5.46). 


If  Y„  is  full,  then 
~req  ’ 


-  ne 


(5.53) 


Substituting  Eq.  (5.53)  into  Eq.  (5.46),  we  obtain  Eq.  (5.47). 
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Associate  d  with  F  and  F? ,  there  are  five  possible  substitution 

*  **  *  -kk 

procedures  denoted  by  S^,  S^,  S^  ,  S2  ,  and  S?  [38]  which  are  given  in 

Table  5.2,  where  b  consists  of  the  rows  of  b  which  are  required  to 
~req  ~ 

compute  b  .  b  are  those  rows  of  b  corresponding  to  the  nonzero  columns  of 
~P  -  P  ~ 


Let  C(.)  denote  the  number  of  operations  required  in  performing  a 
given  procedure  s-  By  comparing  the  entries  of  the  five  substitution 
procedures  in  Table  5.2,  the  following  equation  is  obtained. 


7%  A  A  A  A  A 

C(sJ)  -  C(S^  )  =  C(S2  )  -  C(S2  ) 


(5.54) 


In  large  scale  integrated  circuits,  most  of  devices  are  nonlinear. 
Due  to  the  current  sources  generated  by  Newton-Raphson  iteration  and 
numerical  integration,  it  is  reasonable  to  assume  that  the  source  vectors 
Jssk  and  Jtsk  are  full.  Also,  since  Vgk  and  Vck  are  the  required  node 
voltages,  it  is  reasonable  to  assume  that  they  are  full.  Under  the  above 
assumptions,  we  obtain  the  following  lemmas. 


Lemma  5.3.  a,  b,  y,  and  a  are  full. 


Lemma  5.4.  Suppose  that  the  subnetwork  matrix  is  structurally  symmetric, 

then  the  number  of  rows  of  b  equals  the  number  of  nonzero  rows  of  V. 

-  req 

Lemma  5.5.  Suppose  that  that  subnetwork  matrix  is  structurally  symmetric 

and  J  is  mxm,  then  the  difference  in  the  number  of  operations  between  S^ 
* 

and  S^  (DNS1 )  is: 
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DNS1  =  C(S1)  -  CtS^ 


m  lu  iu 

=  jS1|V(j.)|321(jUsk(j-)|-l)B(iv(j.)|)I21!YtskCj)| 

SlYl-|?tskljS1(|Usk(j‘)|-l)B(|v(j-)|) 

m 

= (number  of  fill-ins  in  V)  -  2^(|  * )  |-1)B(  |v(j  *  )|) 


(5.55) 


Lemma  5.6.  Suppose  that  the  subnetwork  matrix  is  structurally  symmetric 

and  Ygk  is  mxm,  then  the  difference  of  number  of  operations  between  S^  and 
** 

Sx  (DNS2)  is: 


DNS2  =  C(SL)  -  C(S1  ) 

nt  nt  m 

=  DNS1  .3S1fV(-3)  |  j|x|  |B<  (vaO  I ) 


=  2*DNS1  -j51B<  |V(j •) | ) 


(5,56) 


Remark.  From  Lemma  5.6  we  can  conclude  that  if  C(S^)  <  C(S^)  then 
**  * 

C(SX)  <  C ( S ),  that  is,  only  when  C(S^)  >  C(S^)  do  we  need  to  compare 

with  s£*. 


Lemma  5.7.  Suppose  that  the  subnetwork  matrix  is  structurally  symmetric 


and  Y  ,  is  mxm,  then 
~sk 


C(S*)  <  C(S*) 


,  **  ,  ** 
C(S.  )  <  C(S -  ) 


(5.57) 


(5.58) 


Proof:  From  Eq.  (5.54),  we  obtain 
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*  *  **  ** 
CCS^  -  C(S2)  =  C(SX  )  -  C(S2  ) 


*  k 

So  we  only  need  to  prove  C^)  -  C(S2)  <  0 


ccs*)  -  c(s2)  =ji1<!usk^')l'1)B(lY(j')l)"j-1(|ySk(J')l"1) 


Since 


and 


so  we  have 


+jli(  ' 1  l?.k« -)l-i)|z2(j)| 

=jii(|Dgk(j-)|-1)(B(|V(j-)1)-|Z2(j)|) 

|Z1(j)|-B(|V(j-)|) 

|Z2(J)  1 


|z2(j)l>B(|v(j-)|) 

Substituting  Eq.  (5.63)  into  Eq.  (5.60),  we  obtain 

CCS*)  -  CCS*)  <  0 


(5.59) 


(5.60) 

(5.61) 

(5.62) 

(5.63) 

(5.64) 


** 

From  Lemma  5.7,  we  know  that  S2  and  S2  are  not  as  efficient  as  the 

other  procedures,  so  they  are  eliminated  from  the  list  of  possible 

*  ** 

substitution  procedures.  Now  we  are  left  with  Sp  S^  and  S^  .  The 

possible  combinations  of  factorization  methods  and  substitution  methods  are 
*  **  *  _  ** 

Fl  +  Sp  F^  +  Sp  F^  +  S^  ,  F2  +  Sp  and  F2  +  S^  .  Theoretically  we  can 
not  eliminate  any  of  these  five  combinations,  because  we  can  always  come  up 
with  a  special  subcircuit  structure  for  which  a  particular  combination 
gives  the  best  result.  However,  after  conducting  a  large  number  of 
studies,  we  found  experimentally  that  F^  ♦  S,  gives  the  best  results  for 
all  the  practical  circuits  we  used;  moreover,  F1  is  well  compatible 


•  T — 
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with  the  sparse  matrix  techniques  and  is  the  easiest  one  to  implement,  30 
F  +  was  chosen  to  be  used  in  the  program  SLATE. 

In  the  following  we  would  like  to  present  a  small  selection  of  the 
examples  which  we  have  analyzed  by  using  Lemma  5.2,  Lemma  5.5  and  Lemma 
5.6. 


F.vamnl a  5.1.  The  subcircuit  used  is  a  TTL  two-input  NAND  gate  with 

parasitic  resistors  included  (Fig.  5.6).  The  admittance  matrix  for  this 
subcircuit  is  shown  in  Fig.  5.7. 

DNF  =  20  -  23  -  39  =  -42 

DNS1  =  9  -  13  a  -4 

DNS2  =  -8  -  10  =  -18 

so  the  best  combination  for  this  subcircuit  is  F^  +  S^. 

Example  5.2.  The  subcircuit  used  is  an  ECL  two-input  NOR  gate  with 

parasitic  resistors  included  (Fig.  5.8).  The  admittance  matrix  for  this 
subcircuit  is  shown  in  Fig.  5.9. 

DNF  =  17  -  15  -  27  =  -25 

DNS1  =  6  -  8  =  -2 

DNS 2  =  -4  -  6  =  -10 

so  the  best  combination  for  this  subcircuit  is  F^  *  S^. 

Example  5. 3.  The  subcircuit  used  is  an  MOS  two-input  NAND  gate  with 
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parasitic  resistors  included  (Fig.  5.10).  The  admittance  matrix  for  this 
subcircuit  is  shown  in  Fig.  5.11. 

DNF  z  13  -  9  -  30  =  -21 

DNS1  z  3  -  5  =  -2 

DNS 2  =  -4  -  7  =  -1 1 

so  the  best  combination  for  this  subcircuit  is  F^  +  Sp 

These  three  examples  show  that  at  the  subnetwork  level  the  best 
combination  is  F^  +  Sp 

For  the  interconnection  matrix,  because  this  matrix  has  the  same 
property  as  that  of  the  matrix  formulated  by  the  MNA  for  any  circuit ,  so 
the  new  reordering  strategy  of  the  MNA  and  the  Markowitz  sparse  matrix 
scheme  are  used  to  exploit  the  sparsity  at  this  level. 


5.6.  Implementation  of  the  Node  Tearing  Method 


As  concluded  in  the  previous  section  the  best  combination  at  the 
subcircuit  level  is  in  most  cases  F^  +  Sp  now  we  would  like  to  describe 
the  implementation  of  this  approach.  First,  at  the  subcircuit  level,  the 
source  vector  is  appended  to  the  matrix  to  form 


istk 


(5.65) 


Secondly,  use  the  new  reordering  strategy  of  the  MNA  and  the  Markowitz 
sparse  matrix  scheme  to  find  the  LU  factorization  of  Xsk*  The  LU 
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Fig-  5.11  Admittance  Matrix  for  the  Subcircuit  in  Fig.  5.10 


factorization  operates  on  the  whole  matrix  but  terminates  after  the  LU 
factorization  of  Ygk  is  obtained.  Now  the  original  matrix  is  transformed 


Thirdly,  formulate  the  interconnection  matrix  equation  (5.40).  Fourthly, 

use  the  new  reordering  strategy  of  the  MNA  and  the  Markowitz  sparse  matrix 

scheme  to  find  the  LU  factorization  of  Eq.  (5.40),  and  use  forward  and 

backward  substitution  to  find  the  solutions  for  yt  and  yr.  Fifthly,  use 

backward  substitution  to  solve  the  upper  part  of  Eq.  (5.46)  to  obtain  the 

solutions  V  ,  . 

~sk 

Remark :  The  reordering  of  the  subnetwork  and  network  matrix  equations  is 
done  in  the  preprocessing  phase.  The  subnetwork  matrix  equations  are 
reordered  first.  So  when  the  network  matrix  equations  are  reordered,  the 

'ft 

structures  of  all  the  Y_..  's  are  known.  From  the  structures  of  all  the  Y._,  's 

^ttk  ~ttk 

and  the  circuit  description  of  the  network,  we  can  reorder  the  network  matrix 
equations  by  the  new  reordering  strategy  of  the  MNA  and  the  Markowitz  sparse 


matrix  scheme. 
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5.7.  Circuit  Interpretation  of  the  Tearing  Methods 

Although  the  derivations  of  the  tearing  methods  are  quite 
mathematical,  there  is  a  very  simple  circuit  interpretation.  The  node 
tearing  method  is  just  a  generalized  Norton  equivalent  circuit  approach. 
The  branch  tearing  method  is  just  a  generalized  Thevenin  eqivalent  circuit 
approach. 

Let  us  consider  node  tearing  first.  Consider  the  special  case  when 
there  is  only  one  tearing  node.  The  partial  interconnection  matrix 
equations  (Sq.  (5-42))  that  we  obtain  for  each  subnetwork  are  just  the 
Norton  equivalent  circuit  matrix  equations  for  that  subnetwork.  This  is 
illustrated  in  Fig.  5.12(a).  So  for  the  case  when  there  are  more  than  one 
tearing  node,  Eq.  (5*42)  is  just  the  generalized  Norton  equivalent  circuit 
matrix  equations  for  each  subnec.  See  Fig.  5- 12(b)  for  an  illustration  of 
the  case  of  two  tearing  nodes. 

Now  let  us  consider  branch  tearing  for  the  special  case  when  there  is 
only  one  tearing  branch.  The  partial  interconnection  matrix  equations  that 
we  obtain  for  each  subnetwork  are  just  the  Thevenin  equivalent  circuit 
matrix  equations  for  that  subnetwork.  This  is  illustrated  in  Fig. 
5-  15(a).  So  for  the  case  when  there  is  more  than  one  tearing  branche,  the 
partial  interconnection  matrix  equations  that  we  obtain  for  each  subnetwork 
are  just  the  generalized  Thevenin  eqivalent  circuit  matrix  equations  for 
that  subnetwork  Fig.  5. 13(b)  illustrates  the  case  of  two  tearing  branches. 


Fig.  5.12  Norton  Equivalent  Circuit  Interpretation  of  the  Node  Tearing  Method 


1 
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5.8.  Discussion  and  Conclusion 

The  reason  why  we  considered  many  possible  LU  factorization  and 
substitution  procedures  at  the  subnetwork  level  is  that  at  the  subnetwork 
level  we  only  want  to  perform  Gaussian  elimination  for  the  variables  vgk 
and  the  elimination  procedure  terminates  after  the  LU  factorization  of  Ygk 
is  obtained.  At  the  interconnection  level,  because  we  perform  the  Gaussian 
elimination  for  all  the  variables,  only  one  LU  factorization  and 
substitution  procedure  is  used. 

As  discussed  in  Section  5.5,  it  is  possible  to  generate  special 
subcircuit  structures  for  which  some  other  combinations  give  better 
results,  however,  our  experimental  results  show  that  even  for  these 
specially  constructed  circuits  the  difference  of  the  number  of  operations 
is  only  1  or  2  most  of  the  time,  so  thi3  very  small  savings  does  not 
Justify  the  extra  difficulties  of  implementation  associated  with  these 
other  combinations;  moreover,  for  all  the  practical  circuits  we  tested, 
+  S  showed  considerable  savings  over  all  the  other  combinations. 


4 
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VI.  LATENCY  EXPLOITATION 

In  conventional  circuit  simulation  programs  [1,2],  all  of  the  node 
voltages  or  branch  voltages  and  currents  are  calculated  at  each  iteration 
and  each  timepoint.  Even  with  sparse  matrix  techniques  the  simulation  of 
modern  large-scale  integrated  (LSI)  circuits  is  not  possible  in  many 
situations  due  tc  the  excessive  computation  time  and  high  storage 
requirements.  The  latency  approach  is  a  circuit  analysis  version  of  the 
selective  trace  approach  used  in  logic  simulation.  This  approach  takes 
advantage  of  the  fact  that  in  some  circuits  only  a  small  portion  of  the 
circuit  is  active  at  any  given  time  and  at  any  iteration,  and  t  us  provides 
savings  in  CPU  time. 

In  the  program  SLATE  this  approach  is  applied  at  three  levels:  (1) 
device  level;  (2)  subnetwork  level;  (3)  network  level.  At  the  device 
level,  it  is  also  called  the  bypass  scheme.  This  scheme  is  done  by 
monitoring  the  operating  point  each  nonlinear  device.  If  the  operating 
point  does  not  change  significantly  between  timepoints  or  Newton-Raphson 
iterations,  then  the  device  models  are  not  reevaluated,  and  the  matrix 
entries  computed  at  the  previous  timepoint  or  the  previous  iteration  are 
used  again.  This  scheme  is  used  in  SPICE2  [2]  and  SPLICE  [39]-  Latency  at 
the  subnetwork  and  network  levels  can  be  well  exploited  when  tearing 
methods  are  used  to  analyze  the  network.  The  tearing  method  used  in 
program  SLATE  is  node  tearing,  so  in  the  following  discussion  about  latency 
at  the  subnetwork  and  network  levels,  node  tearing  is  assumed.  Latency 
exploitation  at  the  subnetwork  level  is  presented  in  Section  6.1.  Latency 
exploitation  at  the  network  level  is  presented  in  Section  6.2.  In  Section 
6.3  a  discussion  is  given. 


e 
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6.1.  Latency  Exploitation  at  the  Subnetwork  Level 

As  discussed  in  Chao  ter  V,  the  node  tearing  method  is  just  a 
generalized  Norton  equivalent  circuit  apo roach.  After  all  the  internal 
circuit  variables  of  a  subnetwork  are  eliminated,  a  generalized  Norton 
equivalent  circuit  of  the  subnetwork  is  obtained.  Combining  the  equivalent 
circuits  of  all  the  subnetworks  with  the  rest  of  the  network  (Fig.  6.1), 
we  obtain  the  interconnection  circuit.  So  after  applying  the  node  tearing 
method  to  tear  the  network  apart  into  several  subnetworks,  the 
preprocessing  of  each  subnetwork  to  obtain  the  contribution  to  the 
interconnection  matrix  is  equivalent  to  constructing  an  equivalent  circuit 
for  each  subnetwork.  If  the  solutions  of  the  circuit  variables  of  a 
subnetwork  do  not  change  significantly  between  timepoints  or  Newton-Raphson 
iterations,  then  there  is  no  need  to  reconstruct  an  equivalent  circuit  for 
that  subnetwork.  The  equivalent  circuit  constructed  at  the  orevious 
timepoint  or  the  previous  iteration  i3  used  again  and  the  subnetwork  is 
declared  as  latent.  The  subnetwork  remains  latent  until  the  solutions  of 
the  circuit  variables  of  the  subnetwork  change  significantly  between  when 
it  is  declared  latent  and  the  present  time  or  the  present  iteration.  This 
is  the  basic  concept  of  latency  at  subnetwork  level. 

There  are  two  types  of  latency  at  the  subnetwork  level:  one  is 

latency  in  the  Newton-Raphson  iteration,  the  other  is  latency  in  time. 

Latency  in  the  Newton-Raphson  iteration  i3  not  natural.  It  is  related 
to  the  convergence  property  of  each  subnetwork  and  the  initial  guess  of  the 
operating  point  for  each  subnetwork.  Let  us  consider  the  example  in  Fig. 

6.2.  It  is  assumed  that  the  input  signal  is  constant  and  that  a  dc 
analysis  is  required.  Different  subnetworks  may  require  different  number 


/ 
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FP-7048 


Fig.  6.1(a)  A  Network  Partitioned  into  Three  Subnetworks  by 
the  Node  Tearing  Method. 

(b)  Equivalent  Interconnection  Circuit  obtained  from 
the  Node  Tearing  Method. 


Subnetwork  \  f  Subnetwork 


Fig.  6.2  Example  of  Latency. 
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of  iterations  to  converge,  for  example,  because  the  initial  guess  of  the 
operating  points  may  be  good  for  some  subnetworks  and  bad  for  the  others. 
After  these  subnetworks  converge,  they  are  declared  as  latent.  The 
ftewton-Raphson  iterations  continue  until  all  the  subnetworks  converge. 
This  phenomenon  is  called  latency  in  the  Newton-Raphson  iteration.  Taking 
advantage  of  this  latency  results  in  savings  in  the  execution  time.  This 
latency  may  not  exist  in  some  circuits.  For  example,  a  linear  circuit. 

Latency  in  time  is  a  natural  phenomenon.  All  physical  devices  have 
intrinsic  delay  time  between  excitation  and  response.  For  a  large  network, 
when  the  input  signal  changes,  it  takes  time  for  this  change  to  oropagate 
to  the  rest  of  the  network.  Let  us  consider  the  example  in  Fig.  5.2 
again.  Let  us  assume  that  the  input  changes  from  OV  to  5V  at  time  t^ . 
Initially,  orobabably  only  the  first  few  subnetworks  are  not  latent,  the 
rest  of  the  subnetworks  are  latent.  As  time  oasses,  the  change  in  the 
response  propagates  to  the  intermediate  subnetworks.  At  this  time,  only 
the  intermediate  subnetworks  are  not  latent,  the  rest  of  the  subnetworks 
are  latent.  Finally,  the  change  propagates  to  the  last  few  subnetworks  and 
the  rest  of  the  subnetworks  are  latent.  This  phenomenon  is  called  latency 
in  time,  and  it  always  exists  in  real  circuits.  Taking  advantage  of  this 
latency  results  in  savings  in  the  execution  time. 

In  order  to  exploit  these  two  tyoes  of  latency,  some  sort  of  latency 
criteria  are  required  to  determine  if  a  subnetwork  is  latent.  The  latency 
criteria  proposed  here  are  developed  for  the  node  tearing  method,  if  the 
branch  tearing  method  is  used,  these  criteria  should  be  modified 


accordingly . 
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Let  us  consider  one  subnetwork  Let  the  tearing  node  voltages  of 
Nk  be  denoted  by  vtk,  the  node  voltages  of  N^  be  denoted  by  vsk,  the  node 
voltages  of  all  the  nonlinear  devices  be  denoted  by  vn}_k. 

First,  let  us  consider  the  latency  criterion  in  the  Newton-Raphson 
iteration.  In  principle,  the  solution  of  all  the  circuit  variables  of  a 
subnetwork  should  be  checked  to  determine  if  the  subnetwork  is  latent. 
However,  to  check  for  latency  in  the  Newton-Raphson  iteration  only  the  node 
voltages  of  all  the  nonlinear  devices  need  be  checked.  The  latency 
criterion  in  the  Newton-Raphson  iteration  used  in  SLATE  is  as  follows:  A 
subnetwork  Nk  is  declared  as  latent  at  the  ith  iteration  if  the  following 
two  conditions  are  satisfied. 


(1)  |vnlk  (i-l)-vnlk  (i-2)|  <  £a+  er  max(|vnlk  ^  >  (6.1) 


|vnik  (1-2)1) 
m 


m  =  1,2, _ 


(i)-vtk  (i-l)|  <  ea+  sr  max(|vtk  (i)  |  ,  |v  (i-l)j  ) 

m 


tk  tk 

m  m 


m  *  1,2 , ... . 


where  e  and  e_  are  absolute  and  relative  error  criteria, 
a  * 


The  subnetwork  Nk  will  remain  latent  as  long  as 


(6.2) 


(3)  |v  k  (i-fj)-v  (1-1)|  <  e  +  Er  max(|v  k  (i+j)|, 


(6.3) 


|v  (i-1) j )  m  =>  1,2, - 

to  j  *  1 , 2 , - 

Once  a  subnetwork  is  declared  as  latent  in  the  Newton-Raphson 
iteration,  no  linearization  of  the  nonlinear  devices  of  Nk  is  required,  no 
preprocessing  of  the  subnetwork  to  obtain  the  partial  contribution  to  the 
interconnection  matrix  is  required,  no  backward  substitution  to  obtain  the 
solutions  of  the  internal  circuit  varijbles  is  required,  and  no  convergence 
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tests  are  required.  One  only  needs  to  monitor  the  tearing  node  voltages 
and  to  bring  the  previous  partial  contribution  to  the  interconnection 
matrix . 

The  simulation  data  from  SLATE  show  that  considerable  savings  in 
execution  time  is  obtained  and  the  output  results  are  essentially  the  same 
as  those  from  YSPICE.  Table  6.1  gives  the  simulation  data  from  SLATE  for 
the  circuit  shown  in  Fig.  6.3-  A  dc  analysis  was  performed.  For  this 
circuit,  a  42.4216  latency  exploitation  was  achieved  and  a  22.65?  savings  in 
CPU  time  was  obtained. 


Remark:  Because  the  CPU  time  shown  in  Table  6. 1  is  the  total  CPU  time, 
which  includes  the  time  spent  in  the  I/O  and  other  utility  subroutines,  the 
savings  in  CPU  time  is  not  the  same  as  the  latency  exploitation. 

For  the  latency  in  time  criteria,  four  schemes  are  proposed  here.  The 
first  three  schemes,  scheme  0,  scheme  1  and  scheme  2,  have  been  implemented 
and  tested  in  program  SLATE.  Scheme  3  is  still  under  investigation. 


Scheme  0  is  the  easiest  and  the  crudest  scheme  that  could  be 
implemented.  A  subnetwork  is  considered  latent  at  time  t^  if 


(1)  Kk  (tn)-vtk  ‘W'  *  V  £r  max(|vtk  (tn)i’ 
mm  m 


The  subnetwor 


V  (tn-l)1) 

C'PN 


m  *  1,2, _ 

will  remain  latent  as  long  as 


(6.4) 


^  ^Vtk  vtk  ^n-lM  <  e  +  e  max(|v  (t  )  , 

m  J  m  a  r  tk  n+j' 

m 


tn  *  1,2,.... 

j  "  1,2,.... 

The  advantages  of  Scheme  0  are: 


» 


(6.5) 


(b)  Entire  Network:  A  Chain  of  Inverters. 
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Table  6.1  Simulation  Data  of  a  DC  Analysis 
for  the  MOS  Circuit  in  Fig.  6.3. 


DC  analysis 

SLATE 

SPICE2 

#  of  subnetworks 

times  #  of  j  231 

iterations  j 

231 

#  of  nonlatent 
subnetworks  times 

#  of  iterations 

133 

Latency  exploitation 

42.42% 

Total  CPU  time 
(sec . ) 

3.927 

5.077 

Savings  in  CPU 
t  ime 

22.65% 
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(1)  It  is  very  easy  to  implement  and  there  is  no  overhead. 

(2)  It  is  faster  than  Scheme  1 
The  disadvantages  of  Scheme  0  are: 

(1)  It  is  not  reliable  and  it  is  not  accurate.  If  the  network  is  a 
stiff  system  and  some  of  the  node  voltages  are  slowly  varying,  then  it  is 
possible  to  declare  a  slowly  varying  subnetwork  as  latent  and  consequently 
wrong  answers  are  obtained. 

(2)  The  tearing  node  voltages  at  time  t  must  be  stored,  that  is  , 
more  memory  is  required. 

Scheme  1  is  the  most  accurate  scheme,  and  it  only  takes  advantage  of 
latency  in  the  Newton-Raphson  iteration.  It  i3  based  on  the  idea  that  if  a 
subnetwork  is  latent  in  time,  then  it  is  also  latent  in  the  Newton-Raphson 
iteration.  Even  if  we  do  not  take  advantage  of  latency  in  time,  all  the 
subnetworks  are  treated  as  nonlatent  in  time  and  are  solved  at  least  once 
at  every  timepoint.  Those  subnetworks  which  are  latent  in  time  at  any 
timepoint  will  be  declared  as  latent  in  the  Newton-Raphson  iteration  after 
one  iteration  at  that  timepoint.  So  at  most  one  iteration  for  each  latent 
subnetwork  is  wasted  at  one  timepoint.  Scheme  1  is  as  follows: 

(1)  Solve  the  entire  network  including  all  the  subnetworks  at  least 
once  at  every  timepoint. 

(2)  If  any  subnetwork  is  latent  in  time,  then  it  is  latent  in  all  the 
subsequent  Newton-Raphson  iterations  at  that  timepoint.  So  only  one 
iteration  for  that  subnetwork  is  performed. 


t 


k 


i 
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(3)  For  the  nonlatent  subnetworks,  the  Newton-Raphson  iterations  are 
continued  until  convergence  is  obtained  at  that  timepoint. 

The  advantages  of  Scheme  1  are: 

(1)  It  is  very  easy  to  implement  and  there  is  no  overhead. 

(2)  The  tearing  node  voltages  at  time  tn-1  need  not  be  stored,  that 
is,  less  memory  is  required. 

(3)  It  is  accurate  and  reliable. 

The  disadvantage  of  Scheme  1  is  that  at  every  timepoint  one  iteration  is 
wasted  for  each  subnetwork  latent  in  time. 

Scheme  0  is  efficient  but  is  not  reliable.  Scheme  1  is  reliable  but 
is  not  efficient.  If  the  network  being  solved  is  not  stiff,  then  Scheme  0 

is  preferred.  However,  if  the  network  i3  stiff  and  efficiency  is 

important ,  then  both  Scheme  0  and  Scheme  1  are  not  suitable .  Scheme  2  wa3 
developed  to  accommodate  this  situation.  It  i3  similar  to  Scheme  0  in 
efficiency  and  it  is  similar  to  Scheme  1  in  reliability.  It  differs  from 
Scheme  0  in  that  3ome  extra  checks  are  made  to  make  sure  that  slowly 
varying  subnetworks  will  not  be  declared  as  latent.  All  the  slowly  varying 
subnetworks  are  declared  as  nonlatent. 

Let  the  charges  of  capacitors  and  the  fluxes  of  inductors  of 

subnetwork  Nk  be  denoted  by  Qk  =  (Qkl'®k2» . ,Qkb).  ^et  the  currents  of 

capacitors  and  the  voltages  of  inductors  of  subnetwork  Nk  be  denoted  by 

-k  s  ^Ikl’Ik2> . Ifcb^  •  Scheme  2  is  as  follows:  A  subnetwork  is 

considered  as  latent  if  the  following  three  conditions  are  satisfied. 
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<D  Kk  (ta>-vtk  (tn-l)!  <  V  £r  max(lvtk  (tn>  !  * 


(6.6) 


|v  (t  ,) 
1  tk_  n-1 


m  *  1 ,  2 ,  •  •  *  t 


This  condition  is  the  same  as  that  of  Scheme  0. 


(2)  K  (tn)-Ik  *  £c+  £r  max(|l  (tn)|, 

mm  111 


(6.7) 


'rk  ('n-l’D 

m 


m  =  1,2  ...»  6 


where  e  is  the  absolute  error  criterion  for  current.  This  condition  is 
c 

used  to  check  if  the  changes  of  the  energy-storage  elements  of  subnetwork 


are  small. 


<3)  Vl  Qk  (c„)-Qk  (t^j)  1 


m  »  1,  2,  ...,  b 


(6.8) 


This  condition  is  used  to  check  if  there  are  slowly  varying  nodes  within 

subnetwork  Nk.  In  order  to  avoid  division  by  zero,  if 

|lk  (tQ)-I.  (cn_1)|  <  e  ( e  is  a  very  small  quantity,  it  is  1CT42  in 
m  Km 

SLATE),  then  condition  (3)  is  skipped. 


The  subnetwork  Nk  will  remain  latent  as  long  as 

(*>  Kk  <W-Vtk  *  sa+  Er"a*(lvtk  CC»+3 > 1  *  <6'9> 

m  m  m 

K^n-l^  m-1,  2,  ... 

Eq.  (6.81  is  derived  based  on  the  following  reasoning.  Let  us  assume 

that  we  are  dealing  with  a  linear  capacitor  and  an  exponential  waveform, 


Q  —  C«V 


(6.10) 


*a.  i 

dt 


(6.11) 
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/  -C/t 

V  =  VDD(1-  e  ) 

(6.12) 

t  /t 

Q(tn)  =  C*^D(1  -  e  n  ) 

(6.13) 

Cn  \!  T 

Q(tn_i)  =  C*VDD(1  -  e  ) 

(6.14) 

C*VDD  -t/t 

I(tn)  -  -  T  e  n 

(6.15) 

C*VDD  *’tn-l/x 
‘  -  T  e 

(6.16) 

From  Eqs.  (6.13),  (6.14),  (6.15),  and  (6.16),  we  obtain 

h  I(tn)~I(l:r'-l)  Vl  (6.17) 

So  Eq.  (6.8)  means  that  although  the  change  in  the  response  of  a  capacitor 
is  very  small,  if  hn_^  is  smaller  than  t,  then  the  capacitor  is  not  latent, 
it  is  just  slowly  varying. 

The  advantages  of  Scheme  2  are: 

(1)  It  is  faster  than  Scheme  1. 

(2)  It  is  accurate  and  reliable.  Slowly  varying  subnetworks  are 
detected  and  are  treated  as  nonlatent  subnetworks. 

The  disadvantages  of  Scheme  2  are: 

(1)  More  checking  is  required. 


* 

.  t. 


i 


(2)  The  tearing  node  voltages  at  time  t  ^  must  be  stored,  that  is, 
more  memory  is  required. 

(3)  slowly  varying  subnetworks  are  detected  and  are  treated  as 
nonlatent  subnetworks,  even  though  the  changes  in  the  response  may  be 
negligible. 

Remark :  The  detection  of  slowly  varying  subnetworks  increases  the  accuracy 
and  reliability  of  the  program,  that  is  why  it  is  an  advantage.  However, 
since  the  changes  in  these  slowly  varying  subnetworks  are  small,  treating 
them  as  nonlatent  subnetworks  is  not  efficient,  that  i3  why  it  is  also  a 
disadvantage. 

In  order  to  overcome  this  problem,  Scheme  3  was  proposed.  Scheme  3 
takes  full  advantage  of  latency  in  time.  Scheme  3  is  as  follows: 

(1)  The  truncation  error  criteria  are  used  to  determine  the  timestep 
for  each  subnetwork. 

(2)  Each  subnetwork  is  analyzed  with  its  own  timestep. 

The  advantages  of  Scheme  3  are: 

(1)  It  should  be  faster  than  all  the  other  schemes. 

(2)  It  is  accurate  and  reliable.  Since  every  subnetwork  has  its  own 
timestep,  there  will  not  be  the  problem  of  slowly  varying  subnetworks. 


The  disadvantages  of  Scheme  3  are: 
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(1)  It  is  not  compatible  with  the  present  version  of  SLATE.  An  extra 
event  scheduler  is  required  and  the  data  structures  of  SLATE  has  to  be 
revised. 

(2)  It  is  more  complicated  to  implement. 

Because  Scheme  3  is  not  compatible  with  the  present  version  of  SLATE, 
therefore  it  has  not  been  tested  and  implemented  in  SLATE. 

In  the  following,  a  small  selection  of  examples  is  presented  to  give  a 
comparison  among  the  first  three  schemes  of  SLATE:  Scheme  0,  Scheme  1,  and 
Scheme  2,  and  YSPICE. 

Example  6.1:  The  MOS  circuit  shown  in  Fig  6.3  was  analyzed  by  SLATE  and 
YSPICE.  The  output  results  of  the  three  3Cheme3  of  SLATE  and  YSPICE  are 
essentially  the  same  (within  four  significant  figures).  The  simulation 
data  of  a  transient  analysis  are  given  in  Table  6.2.  The  simulation  data 
show  that  both  Scheme  0  and  Scheme  2  are  more  efficient  than  Scheme  1.  For 
this  circuit,  Scheme  2  is  the  most  efficient  and  is  about  2.5  times  faster 
than  YSPICE. 

This  example  shows  that  the  latency  in  time  approach  is  useful  for  the 
analysis  of  MOS  circuits.  The  next  example  shows  that  the  latency  in  time 
approach  is  also  useful  for  bipolar  circuits. 

Example  6.2:  The  TTL  circuit  shown  in  Fig  6.4  was  analyzed  by  SLATE  and 
YSPICE.  The  output  results  of  the  three  schemes  of  SLATE  and  YSPICE  are 
essentially  the  same(within  four  significant  figures).  The  simulation  data 
of  a  transient  analysis  are  given  in  Table  6.3*  For  this  example,  the 
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Table  6.2  Simulation  Data  of  a  Transient  Analysis 
for  the  MOS  Circuit  in  Fig.  6.3. 


Transient 

analysis 

Scheme  0 

Scheme  1 

Scheme  2 

YSPICE 

#  of  subnetworks 
times  #  of 
iterations 

2849 

2475 

2585 

#  of  nonlatent 

s  ubnetworks  times 

#  of  iterations 

790 

1277 

706 

Latency 

exploitation 

72.27% 

48.40% 

72.69% 

Total  CPU  time 
(sec . ) 

18.358 

22.073 

17.022 

— — — - 

42.877 

Savings  in 

CPU  time 

Np- 

57.12% 

48.52% 

60.30% 

4K 


Subcircuit:  A  TTL  NAND  Gate. 
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Table  6.3  Simulation  Data  of  a  Transient  Analysis 
for  the  TTL  Circuit  in  Fig.  6.4. 


Transient 

analysis 

Scheme  0 

Scheme  1 

Scheme  2 

YSPICE 

#  of  subnetworks 
times  #  of 
iterations 

2850 

2945 

2770 

#  of  nonlatent 
subnetworks  times 

#  of  iterations 

1516 

2128 

1945 

Latency 

exploitation 

46.81% 

27.74% 

29.78% 

Total  CPU  time 
(sec. ) 

68.996 

97.164 

. 

86.033 

132.338 

Savings  in 
CPU  time 


47.86% 


26.58% 


34.99% 


simulation  data  show  that  Scheme  0  is  the  most  efficient,  Scheme  1  is  still 
the  least  efficient .  The  reason  why  Scheme  2  is  not  as  efficient  as  Scheme 
0  can  be  traced  to  the  close-coupling  of  bipolar  circuits.  So  the 
conclusion  obtained  from  this  example  is  that  the  error  criteria  should  be 
loosened  for  bipolar  circuits.  Scheme  0  is  about  2  times  faster  than 
YSPICE. 

Although  the  above  two  examples  show  that  Scheme  0  is  very  efficient, 
however,  as  mentioned  before,  Scheme  0  has  a  reliability  problem.  The 
following  example  shows  that  Scheme  0  may  give  inaccurate  output  results 
for  stiff  systems. 

Example  6.3:  The  RC  circuit  shown  in  Fig.  6.5  was  analyzed  by  the  three 
schemes  of  SLATE.  This  circuit  is  a  stiff  system,  the  output  results  are 
given  in  Table  6.4.  For  scheme  0,  because  the  changes  of  the  tearing  node 
voltages  of  subnetwork  1  and  subnetwork  2  are  very  small  after  t  =  13  ns, 
both  subnetworks  are  declared  as  latent.  Since  the  input  is  constant  too 
after  t  =  13  ns,  all  the  calculated  output  voltages  will  remain  unchanged 
afterwards,  while  the  true  output  voltages  should  increase  slowly.  This 
phenomenon  can  be  observed  from  the  unchanged  output  voltages  after 
t  =  13  n3  in  Table  6.4(a).  This  example  shows  that  Scheme  0  may  not  give 
accurate  results  when  the  network  is  stiff  and  that  both  Scheme  1  and 
Scheme  2  give  accurate  results  even  when  the  network  is  stiff. 


Subnetwork  1  Subnetwork  2 


of  a  Stiff  System* 
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Table 


TIME 

0 .  OOOD+OO 
1 .000D-09 
2. 000D-09 
3.000D-09 
4. OOOD-09 
5.000D-09 
6. OOOD-09 
7. 000D-09 
8. OOOD-09 
9. OOOD-09 
1 .000D-08 
1 . 100D-08 
1 .200D-08 
1.300D-08 
1. 400D-08 
1 .500D-08 
1 .600D-08 
1 .700D-08 
1.800D-08 
1 .900D-08 
2.0Q0D-08 
2. 100D-08 
2.200D-08 
2. 30OD-O8 
2.400D-08 
2.500D-08 


6.4(a)  Scheme  0. 


V(2)  V ( 3 )  V ( 4 ) 


0. OOOD+OO 
2. 9360+00 
3.6040+00 
3- 733D+O0 
3.7490+00 
3. 7510+00 
3.749D+00 
3.750D+00 
3.750D+00 
3.749D+00 
3.749D+00 
3.748D+00 
3-748D+O0 
3-7490+00 
3.  749D+00 
3.749D+00 
3.749D+00 
3.749D+00 
3.749D+O0 
3.749D+00 
3.749D+00 
3.749D+00 
3-749D+00 
3.749D+00 
3. 7490+00 
3.749D+00 


0.  OOOD+OO 
7-567D-01 
1 .  146D+00 
1.2370+00 
1.2490+00 
1 . 25 1D+00 
1 .249D+00 
1.250D+00 
1 . 250D+00 
1. 2490+00 
1.2490+00 
1 . 248D+00 
1 .248D+00 
1.2490+00 
1.249D+00 
1.2490+00 
1.2490+00 
1.249D+O0 
1.249D+00 
1.249D+00 
1.2490+00 
1.2490+00 
1.249D+00 
1.2490+00 
1.249D+00 
1.249D+00 


0. OOOD+OO 
4. 445D-1 3 
3- 7260-1 2 
1 . 144D-1 1 
2.403D-11 
4. 16  ID-1 1 
6. 187D-1 1 
8.020D-1 1 
9 . 850D-1 1 
1.  16  ID-1 0 
1 . 260D-10 
1 . 307D-10 
1 . 302D-1 0 
1.2460-10 

1.1450-10 
1. 145D-10 

1.1450-10 
1.  145D-10 
1 . 145D-10 
1 .  1450-10 
1 .  1450-10 

1. 1450-10 
1.  145D-10 

1. 1450-10 
1 .  145D-10 
1.  145D-10 
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Table  6.4(b)  Scheme  1. 


TIME 

V  (2 ) 

V(3) 

V  ( 4 ) 

0 . OOOD+QO 

O.OOOD-fOO 

0.0000+00 

0.  OOQD+OO 

1 .000D-09 

2. 936D+00 

7-5670-01 

4.444D-13 

2. OOOD-09 

3 . 6040+00 

1. 146D+00 

3.726D-12 

3 . 000D-09 

3- 733D+00 

1.2370+00 

1 . 144D-1 1 

4. OOOD-09 

3.749D+00 

1 . 249D+0O 

2.403D-11 

5. OOOD-09 

3. 7510+00 

1 . 25 10+00 

4. 161D-1 1 

6. OOOD-09 

3-7490+00 

1.249D+00 

6. 187D-1 1 

7. OOOD-09 

3. 750D+00 

1 . 250D+00 

8.020D-1 1 

8. OOOD-09 

3-7500+00 

1 . 250D+00 

9 . 850D-1 1 

9. OOOD-09 

3-7490+00 

1 -249D+0O 

1.  1 7  2D  —1 0 

1 .000D-08 

3-7490+00 

1.249D+00 

1 . 404D-10 

1 .  100D-03 

3-749D+O0 

1.249D+00 

1.666D-10 

1.200D-08 

3-7490+00 

1.249D+00 

1.958D-10 

1.300D-08 

3-750D+00 

1 . 250D+OQ 

2. 28 1 D — 1 0 

1 . 400D-08 

3-7510+00 

1.251D+00 

2.629D-10 

1 .500D-08 

3-7510+00 

1 -251D+0O 

2.919D-10 

1.600D-08 

3. 7510+00 

1.251D+00 

3.208D-10 

1 .700D-08 

3-7510+00 

1.2510+00 

3.498D-10 

1 .800D-08 

3-7510+00 

1.251D+O0 

3. 788D-10 

1 .900D-08 

3-751D+00 

1.251D+00 

4.077D-10 

2 . 000D-03 

3-751D+00 

1.251D+O0 

4. 367D-10 

2.  100D-08 

3-751D+0O 

1.251D+00 

4. 656D-10 

2.200D-08 

3-7510+00 

1 . 25 1D+00 

4.946D-10 

2. 300D-08 

3-7500+00 

1 . 250D+00 

5.236D-10 

2.400D-08 

3-750D+00 

1 . 250D+00 

5. 525D-10 

2.500D-08 

3-7500+00 

1.250D+00 

5. 815D-10 

-159- 


Table 


TIME 

0. 000D+00 
1 . 000D-09 
2. 000D-09 
3 . 000D-09 
4.000D-09 
5 . 0OOD-O9 
6 . 000D-09 
T.000D-O9 
8. 000D-O9 
9.000D-09 
1 . OOOD-08 
1 . 100D-08 
1 .2G0D-08 
1 .300D-08 
1.400D-08 
1 .500D-08 
1.600D-08 
1 .700D-08 
1 .800D-08 
1.900D-08 
2. 000D-08 
2.  100D-08 
2.200D-08 
2. 300D-08 
2.400D-08 
2. 500D-08 


6.4(c)  Scheme  2. 


V(2) 

O.OOOD+OO 
2.936D400 
3 . 604D+00 
3- 733D+00 
3.749D-fOO 
3.751D400 
3.749D400 
3. 750D-*00 
3-750D+00 
3-7490+00 
3.749D+00 
3-749D+00 
3.749D+00 
3- 750D+00 
3-751D+00 
3-751D+00 
3-  751D+00 
3-751D+00 
3.751D+00 
3-751D+00 
3-751D+00 
3.751D+00 
3. 751D+00 
3.750D+00 
3. 750D+00 
3-750D+00 


V(  3) 

0.  OOOD+OO 
7.567D-01 
1.  146D+00 
1 -237D+O0 
1.249D+00 
1.251D+00 
1 . 249D+00 
1.250D+00 
1 . 250D+00 
1.249D+00 
1 . 2490+00 
1 . 249D+00 
1 . 249D+00 
1 . 250D+00 
1.251D+00 
1 . 25 10+00 
1.251D+00 
1.251D+00 
1 . 25 10+00 
1 . 25 ID +00 
1.251D+00 
1 . 25 1D+00 
1.2510+00 
1 .250D+00 
1 . 250D+00 
1 . 250D+00 


V  ( 4 ) 

0.  OOOD+OO 
4. 444D-1 3 
3. 726D-12 
1 . 144D-1 1 
2.403D-11 
4.  16  ID-1 1 
6. 187D-11 
8.020D-1 1 
9 . 850D-1 1 
1. 172D-10 
1 . 404D-10 
1 . 666D-10 
1 . 958D-1 0 
2. 281D-10 
2.629D-10 
2. 91 9D-10 
3-208D-10 
3.498D-10 
3. 788D-10 
4. 077D-10 
4. 367D-10 
4. 656D-1Q 
4.946D-10 
5. 236D-10 
5.525D-10 
5.815D-10 
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6.2.  Latency  Exploitation  at  the  Network  Level 

'/Then  the  eubmatrices  are  large  and  the  interconnection  matrix  is  small 
and  sparse,  then  latency  exploitation  at  the  subnetwork  level  provides  most 


of  the  savings  in 

CPU  time 

.  When 

the  reverse  is 

true  , 

that  is,  the 

submatrices  are 

small 

and  the 

interconnection 

matrix 

is  large  and 

relatively  dense, 

then  the 

latency 

exploitation  at 

network 

level  becomes 

important.  Usually,  the  latter  situation  is  true  for  MOS  circuits. 

Latency  exploitation  at  the  network  level  is  equivalent  to  solving  a 
smaller  interconnection  matrix  by  using  voltage  source  substitution.  From 
the  substitution  theorem  we  know  that  the  same  results  will  be  obtained. 
This  approach  can  be  explained  by  the  following  example  as  shown  in  Fig. 
6.6(a).  Let  us  assume  that  at  a  particular  time  or  a  particular  iteration, 
only  subnetworks  5  and  6  are  nonlatent,  all  the  other  subnetworks  are 
latent.  Ey  using  voltage  source  substitution,  the  network  can  be  replaced 
by  the  equivalent  network  as  shown  in  Fig.  6.6(b).  The  equivalent  network 
is  solved  to  obtain  the  solutions  of  all  the  nonlatent  nodes  (nodes  which 
belong  to  nonlatent  subnetworks) .  This  equivalent  network  is  obtained  as 
follows.  First,  all  the  nonlatent  subnetworks  and  all  the  latent 
subnetworks  which  are  adjacent  to  the  nonlatent  subnetworks  are  included  in 
the  equivalent  network;  secondly,  all  the  tearing  nodes  which  only  belong 
to  latent  subnetworks  are  replaced  by  voltage  sources,  the  resulting 
network  is  the  equivalent  network.  For  this  example,  in  the  equivalent 
network,  the  nonlatent  subnetworks  are  subnetworks  5  and  6,  the  latent 
subnetworks  are  subnetworks  4  and  7,  the  tearing  nodes  which  are  replaced 
by  voltage  sources  are  nodes  5  and  9-  After  the  solution  for  all  the 
nonlatent  nodes  is  obtained,  subnetworks  4  and  7  are  checked  to  see  if  they 


remain  latent. 


If  the  answer  is  yes,  then  the  same  equivalent  circuit  is 


Equivalent  Network  for  (a). 
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used  again.  If  the  answer  is  no,  then  a  new  equivalent  network  is 
generated . 

The  above  is  the  conceptual  idea.  In  the  implementation,  because 
sparse  matrix  techniques  are  used,  we  do  not  want  to  really  generate  the 
equivalent  network  and  we  do  not  want  to  reorder  the  interconnection  matrix 
and  reconstruct  the  sparse  matrix  pointer  systems  every time  a  new 
equivalent  circuit  is  generated.  So  the  following  algorithm  is  Implemented 
in  SLATE. 

(1 )  The  ordering  of  the  interconnection  matrix  is  determined  in  the 

preprocessing  phase  assuming  all  the  subnetworks  are  nonlatent,  and  this 

ordering  is  used  in  the  whole  analysis. 

(2)  At  every  timepoint  or  iteration,  all  the  subnetworks  are  checked 
to  determine  their  latent  status.  All  the  tearing  nodes  which  only  belong 
to  latent  subnetworks  are  labelled  as  latent  nodes. 

(3)  All  the  rows  corresponding  to  the  latent  nodes  are  replaced  by  the 
branch  constraint  relations  of  grounded  voltage  sources.  This  is  done  by 
skipping  those  rows  and  columns  during  the  LU  factorization  and  forward  and 
backward  substitutions. 

Example  6.4:  The  MOS  circuit  shown  in  Pig.  3.18  was  analyzed  by  SLATE 

with  and  without  the  latency  exploitation  at  network  level.  Scheme  2  was 
used.  The  output  results  are  essentially  the  same  for  both 

approaches(whithin  four  significant  figures).  The  simulation  data  for  both 
approaches  are  given  in  Table  5.5«  This  example  shows  that  the  latency 
exploitation  at  network  level  also  provides  savings  in  CPU  time. 


* 
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Table  6.5  Simulation  Data  of  a  Transient  Analysis 
for  the  MOS  Circuit  in  Fig.  3.18. 


Transient 

analysis 

Scheme  2  with 
latency  exploitation 
at  network  level 

Scheme  2  without 
latency  exploitation 
at  network  level 

Total  CPU  time 
(sec . ) 

80.102 

102.365 

Savings  in 

CPU  time 

21.75% 
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6.3*  Discussion 

Four  schemes  for  latency  exploitation  at  subnetwork  level  are  proposed 
in  this  chapter.  Scheme  0,  Scheme  1  and  Scheme  2  were  implemented  and 
tested  in  the  program  SLATE.  Scheme  3  is  not  compatible  with  the  p  ant 
version  of  SLATE,  so  it  is  still  at  the  development  stage.  From  the 
simulation  data  obtained  from  the  first  three  schemes,  our  conclusion  is 
that  Scheme  2  is  the  best  of  these  three  schemes.  However,  for  bipolar 
circuits,  Scheme  2  is  not  the  most  efficient  one.  Conceptually,  Scheme  3 
should  be  the  optimal  one,  so  more  work  will  be  devoted  to  study  this 
scheme. 

In  order  to  illustrate  the  ideas  and  to  estimate  the  inherent  latency 
easily,  chains  of  inverters  are  used  as  example  circuits  in  this  chapter. 
More  complicated  circuits  are  used  in  the  next  chapter  to  evaluate  the 
latency  approaches  used  in  SLATE. 


« 


VII.  CONCLUSIONS 


The  examples  used  in  Chapter  6  are  chains  of  inverters.  One  example 
has  eleven  levels  of  inverters,  the  other  has  five  levels  of  inverters. 
From  the  simulation  data  we  can  see  that  the  latency  exploitation  increases 
with  the  number  of  levels  of  logic  gates.  Since  the  number  of  levels  of 
logic  gates  for  those  circuits  is  large  and  those  circuits  have  very  simple 
interconnection  networks,  so  significant  latency  exploitation  was  obtained. 
In  this  chapter,  the  simulation  data  for  some  circuits,  which  have  a 
complicated  interconnection  network  and  for  which  the  number  of  levels  of 
logic  gates  is  small,  are  presented  to  see  if  the  latency  approach  can 
provide  significant  savings  in  CPU  time  for  these  circuits.  The  simulation 
data  are  compared  with  those  obtained  from  our  DEC-10  version  of  SPICE2. 

Example  7.1;  The  TTL  circuit  shown  in  Fig.  3-17  was  analyzed  by  SLATE  and 
SPICE2.  Scheme  2  was  used  in  SLATE.  The  output  results  of  SLATE  and 
SPICE2  are  essentially  the  same  (within  four  significant  figures).  The 
simulation  data  of  a  transient  analysis  are  given  in  Table  7.1.  For  this 
bipolar  circuit  example,  a  32 . 66 %  latency  exploitation  was  achieved  and  a 
40.15$  savings  in  CPU  time  was  obtained. 

Example  7.2:  The  MOS  circuit  shown  in  Fig.  3-18  was  analyzed  by  SLATE  and 
SPICE2.  Scheme  2  was  used  in  SLATE.  The  output  results  of  SLATE  and 
SPICE2  are  essentially  the  same  (within  four  significant  figures).  The 
simulation  data  of  a  transient  analysis  are  given  in  Table  7.2.  For  this 
MOS  circuit  example,  a  22.53$  latency  exploitation  was  achieved  and  a 
46.70$  savings  in  CPU  time  was  obtained. 


Table  7.1  Simulation  Data  of  a  Transient  Analysis 
for  the  TTL  Circuit  in  Fig  3.17 


Transient 

analysis 

1 

SLATE 

. 

SPICE2 

#  of  subnetworks 
times  #  of 
iterations 

6426 

#  of  nonlatent 
subnetworks  times 

#  of  iterations 

4327 

Latency  exploitation 

1  32.66% 

Total  CPU  time 
(sec . ) 

189.56 

316.74 

Savings  in 

CPU  time 

1 

40.15% 
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Table  7.2  Simulation  Data  of  a  Transient  Analysis 
for  the  MOS  Circuit  in  Fig.  3.18. 


Transient 

analysis 

SLATE 

SPICE2 

#  of  subnetworks 
times  #  of 
iterations 

3600 

#  of  non latent 
subnetworks  times 

#  of  iterations 

2789 

Latency  exploitation 

22.53% 

Total  CPU  time 
(sec . ) 

72.23 

135.52 

Savings  in 

CPU  time 

46.70 
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Also  these  simulation  data  show  that  not  only  the  latency  approach  but 
also  other  new  approaches  implemented  in  SLATE  provide  savings  in  CPU  time. 
This  observation  is  obtained  by  noting  that  the  latency  exploitation  is 
smaller  than  the  savings  in  CPU  time.  These  other  new  approaches  which 
also  provide  savings  in  CPU  time  are  the  new  reordering  scheme  for  the 
modified  nodal  approach  presented  in  Chapter  2  and  the  piecewise  nonlinear 
approach  presented  in  Chapter  3-  The  new  reordering  scheme  for  the 
modified  nodal  approach  avoids  the  problem  of  pivoting  on  zero  diagonal 
elements  and  decreases  the  number  of  operations  at  the  same  time.  However, 
the  efficiency  provided  by  the  new  reordering  scheme  i3  problem-dependent. 
For  example,  if  the  circuit  does  not  have  voltage  sources  or  inductors, 
then  certainly  no  efficiency  can  be  obtained  by  using  our  approach.  The 
piecewise  nonlinear  approach  is  still  at  the  experimental  stage.  All  the 
examples  we  have  simulated  show  that  the  use  of  the  piecewise  nonlinear 
approach  hastens  the  convergence  and  improves  the  global  convergence 
property  of  the  Newton-Raphson  method  for  bipolar  and  MOS  circuits. 
However,  the  proof  of  global  convergence  or  the  conditions  for  global 
convergence  for  the  piecewise  nonlinear  approach  has  not  been  obtained. 
Further  research  is  needed  to  prove  the  global  convergence,  or  to  modify 
the  approach  we  proposed  to  ensure  global  convergence.  Also  more  work  is 
needed  to  study  if  the  strict  piecewise  nonlinear  approach  is  efficient,  and 
if  it  is  not  efficient,  then  the  problem  of  how  to  use  the  ideas  of  the 
piecewise  nonlinear  approach  to  hasten  the  convergence  and  to  improve  the 
global  convergence  property  of  the  Newton-Raphson  method  should  be  studied. 
The  solution  of  the  two  problems  of  numerical  integration  makes  the  program 
more  reliable  and  more  accurate.  This  is  described  in  Chapter  4.  An 
equation  was  presented  to  compute  the  upperbound  on  the  local  truncation 
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error  (LTE)  from  the  maximum  global  error  (CE  )  and  the  solution  time  T. 
The  inaccuracy  in  the  estimation  of  the  local  truncation  error  caused  by 
different  timesteps  was  resolved  by  introducing  a  new  formula  for  the 
estimation.  The  inaccuracy  in  the  estimation  of  the  local  truncation  error 
caused  by  using  the  node  voltages  at  timepoints  of  the  previous  switch 
interval  was  resolved  by  recognizing  this  situation  and  restarting  the 
numerical  integration  from  the  breakpoint. 

In  Chapter  5,  the  ideas  of  tearing  methods  were  detailed,  the  most 
efficient  way  of  implementing  the  node  tearing  method  was  determined 
theoretically  and  experimentally,  and  a  circuit  interpretation  of  tearing 
methods  was  given. 

In  Chapter  6,  four  latency  criteria  schemes  were  proposed.  The  first 
three  schemes:  Scheme  0,  Scheme  1  and  Scheme  2,  were  implemented  and 

tested.  From  the  simulation  data  we  conclude  that  Scheme  2  is  the  best  out 
of  these  three.  Scheme  3  is  still  under  investigation  and  we  think  it 
should  be  the  best  scheme  to  exploit  latency.  More  work  is  needed  to  study 
how  to  implement  this  scheme  efficiently  and  reliably,  and  to  find  out  if 
it  is  really  the  best  scheme. 

The  nested  subnetwork  approach  [41,42, 43]  is  the  approach  which  allows 
several  levels  of  subnetworks  and  in  which  the  latency  approach  is  used  at 
every  level  of  the  subnetworks.  This  approach  may  provide  savings  in  the 
time  3pent  in  checking  the  latent  status  of  subnetworks.  Only  latency 
exploitation  at  the  network  level  is  implemented  in  program  SLATE  and  we 
believe  that  this  checking  time  may  be  small,  thus  the  savings  in  CPU  time 
provided  by  the  nested  subnetworks  approach  probably  is  not  significant. 
However,  further  investigation  is  needed  to  yield  conclusive  results. 
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Device  characteristic  Latency  and  function  latency  are  two  concepts  which 
may  provide  some  more  savings  in  CPU  time.  More  investigations  need  to  be 
done  to  exploit  these  two  latencies. 

In  the  present  version  of  SLATE,  a  lot  of  information  which  is  not 
needed  is  still  stored  because  SLATE  evolved  from  YSPICE.  Due  to  this 
reason,  although  tearing  methods  should  provide  savings  in  memory,  no 
comparison  of  memory  usage  was  presented  in  this  thesis. 
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